Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4398

mdt_object_open_lock() may not flush conflicting handles

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.6.0, Lustre 2.5.1
    • 3
    • 12075

    Description

      Calls to mdt_object_open_lock() which do not have MDS_OPEN_LOCK in open flags may fail to flush conflicting handles.

      t:~# rm /mnt/lustre/*
      t:~# ls /mnt/lustre/
      t:~# cp /bin/echo /mnt/lustre/echo
      t:~# lfs path2fid /mnt/lustre/echo
      [0x280000401:0x9:0x0]
      t:~# /mnt/lustre/echo Hi
      Hi
      t:~# lctl clear
      t:~# echo Bye > /mnt/lustre2/echo
      -bash: /mnt/lustre2/echo: Text file busy
      t:~# lctl dk > 2.dk
      
      00000004:00000002:0.0:1387473999.026142:0:15064:0:(mdt_open.c:1610:mdt_reint_open()) I am going to open [0x200000007:0x1:0x0]/(echo->[0x280000402:0x4:0x0]) cr_flag=01102 mode=0100666 msg_flag=0x0
      
      ** There is a typo in the CDEBUG for the next message, the open count
      ** is really printed.
      
              CDEBUG(D_INODE, "normal open:"DFID" lease count: %d, lm: %d\n",
                     PFID(mdt_object_fid(obj)),
                     atomic_read(&obj->mot_open_count), lm);
      
      00000004:00000002:0.0:1387473999.026365:0:15064:0:(mdt_open.c:1242:mdt_object_open_lock()) normal open:[0x280000401:0x9:0x0] lease count: 1, lm: 16
      
      A CR LOOKUP LAYOUT is granted:
      
      00000004:00000002:0.0:1387473999.026430:0:15064:0:(mdt_open.c:1269:mdt_object_open_lock()) Requested bits lock:[0x280000401:0x9:0x0], ibits = 0x9, open_flags = 01102, try_layout = 1, rc = 0
      00000004:00000001:0.0:1387473999.026433:0:15064:0:(mdt_open.c:1332:mdt_object_open_lock()) Process leaving via out (rc=0 : 0 : 0x0)
      
      00000004:00000001:0.0:1387473999.026456:0:15064:0:(mdt_open.c:536:mdt_write_get()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
      00000004:00000001:0.0:1387473999.026458:0:15064:0:(mdt_open.c:723:mdt_mfd_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
      00000004:00000001:0.0:1387473999.026459:0:15064:0:(mdt_open.c:994:mdt_finish_open()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6)
      

      Attachments

        Issue Links

          Activity

            [LU-4398] mdt_object_open_lock() may not flush conflicting handles

            is it still a problem after landing
            LU-4367 llite: Make revalidate return 0 for opens
            ? if not, please consider to re-land.

            vitaly_fertman Vitaly Fertman added a comment - is it still a problem after landing LU-4367 llite: Make revalidate return 0 for opens ? if not, please consider to re-land.

            Sorry to sound a bit harsh: But I think such not very well investigated last minute commits, that reintroduce
            known problems, should really be avoided at any cost. In my view delaying the release and properly investigating
            the issue would have been the right way to handle this.

            rfehren Roland Fehrenbacher added a comment - Sorry to sound a bit harsh: But I think such not very well investigated last minute commits, that reintroduce known problems, should really be avoided at any cost. In my view delaying the release and properly investigating the issue would have been the right way to handle this.
            cdufour Cédric Dufour added a comment - - edited

            Running 2.5.2 server-side with reverted patch re-introduced LU-4520 (ETXTBSY error) => grossly put, no way to edit/use scripts on Lustre (!...).
            So we re-patched 2.5.2 in order to get rid of LU-4520 again.

            We ran perfomances tests before and after applying the patch and found no LU-5197-like significant performances regression:

            BEFORE
            $ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out
            SUMMARY: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :       1191.533       1191.533       1191.533          0.000
               File stat         :        975.334        975.334        975.334          0.000
               File read         :       2502.797       2502.797       2502.797          0.000
               File removal      :       1299.404       1299.404       1299.404          0.000
               Tree creation     :       1286.991       1286.991       1286.991          0.000
               Tree removal      :        320.616        320.616        320.616          0.000
            
            $ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out
            SUMMARY: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :      12310.369      12310.369      12310.369          0.000
               File stat         :      46494.079      46494.079      46494.079          0.000
               File read         :      75171.339      75171.339      75171.339          0.000
               File removal      :      11508.120      11508.120      11508.120          0.000
               Tree creation     :         41.568         41.568         41.568          0.000
               Tree removal      :         18.342         18.342         18.342          0.000
            
            AFTER
            $ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out
            SUMMARY: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :       1179.328       1179.328       1179.328          0.000
               File stat         :        948.171        948.171        948.171          0.000
               File read         :       2472.216       2472.216       2472.216          0.000
               File removal      :       1147.826       1147.826       1147.826          0.000
               Tree creation     :        689.739        689.739        689.739          0.000
               Tree removal      :        131.545        131.545        131.545          0.000
            
            $ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out
            SUMMARY: (of 1 iterations)
               Operation                      Max            Min           Mean        Std Dev
               ---------                      ---            ---           ----        -------
               File creation     :      13519.022      13519.022      13519.022          0.000
               File stat         :      46112.439      46112.439      46112.439          0.000
               File read         :      78350.583      78350.583      78350.583          0.000
               File removal      :      11139.655      11139.655      11139.655          0.000
               Tree creation     :         51.435         51.435         51.435          0.000
               Tree removal      :         29.340         29.340         29.340          0.000
            

            Shouldn't LU-4520 (bug/fix) have priority over LU-5197 (...) ?

            cdufour Cédric Dufour added a comment - - edited Running 2.5.2 server-side with reverted patch re-introduced LU-4520 (ETXTBSY error) => grossly put, no way to edit/use scripts on Lustre (!...). So we re-patched 2.5.2 in order to get rid of LU-4520 again. We ran perfomances tests before and after applying the patch and found no LU-5197 -like significant performances regression : BEFORE $ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 1191.533 1191.533 1191.533 0.000 File stat : 975.334 975.334 975.334 0.000 File read : 2502.797 2502.797 2502.797 0.000 File removal : 1299.404 1299.404 1299.404 0.000 Tree creation : 1286.991 1286.991 1286.991 0.000 Tree removal : 320.616 320.616 320.616 0.000 $ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 12310.369 12310.369 12310.369 0.000 File stat : 46494.079 46494.079 46494.079 0.000 File read : 75171.339 75171.339 75171.339 0.000 File removal : 11508.120 11508.120 11508.120 0.000 Tree creation : 41.568 41.568 41.568 0.000 Tree removal : 18.342 18.342 18.342 0.000 AFTER $ mpirun -np 1 -host localhost ./mdtest-1.9.3 -n 100000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 1179.328 1179.328 1179.328 0.000 File stat : 948.171 948.171 948.171 0.000 File read : 2472.216 2472.216 2472.216 0.000 File removal : 1147.826 1147.826 1147.826 0.000 Tree creation : 689.739 689.739 689.739 0.000 Tree removal : 131.545 131.545 131.545 0.000 $ mpirun -np 66 -hostfile ./hostfile ./mdtest-1.9.3 -n 10000 -i 1 -p 5 -u -v -F -d ./mdtest.out SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 13519.022 13519.022 13519.022 0.000 File stat : 46112.439 46112.439 46112.439 0.000 File read : 78350.583 78350.583 78350.583 0.000 File removal : 11139.655 11139.655 11139.655 0.000 Tree creation : 51.435 51.435 51.435 0.000 Tree removal : 29.340 29.340 29.340 0.000 Shouldn't LU-4520 (bug/fix) have priority over LU-5197 (...) ?

            Patch was reverted due to performance regression

            jlevi Jodi Levi (Inactive) added a comment - Patch was reverted due to performance regression
            adilger Andreas Dilger added a comment - Cherry pick to b2_5: http://review.whamcloud.com/10218
            bogl Bob Glossman (Inactive) added a comment - backport to b2_4: http://review.whamcloud.com/9826

            Hi Jodi,

            We are seeing similar issues with Lustre 2.4.1 & 2.4.2 version. We opened a ticket LU-4773 and it was set as duplicate, so can we have a patch for 2.4 branch.

            Thank You,
            Manish

            manish Manish Patel (Inactive) added a comment - Hi Jodi, We are seeing similar issues with Lustre 2.4.1 & 2.4.2 version. We opened a ticket LU-4773 and it was set as duplicate, so can we have a patch for 2.4 branch. Thank You, Manish

            Patch landed to Master. Please reopen ticket if more work is needed.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Please reopen ticket if more work is needed.
            jhammond John Hammond added a comment - Please see http://review.whamcloud.com/#/c/9063/ .

            People

              guzheng Gu Zheng (Inactive)
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: