Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-594

1.8<->2.1 interop: sanity test_27y: FAIL: files created on deactivated OSTs instead of degraded OST

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.0, Lustre 1.8.7
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 3
    • 4080

    Description

      After upgrading OSS1 (fat-amd-2), running sanity test 27y on Lustre 1.8.6-wc1 clients (client-[12,13]) failed as follows:

      == test 27y: create files while OST0 is degraded and the rest inactive == 20:40:14
      lustre-OST0001-osc is Deactivate:
      open(/mnt/lustre/d0.sanity/d27/f27y0) error: No space left on device
      total: 0 creates in 61.40 seconds: 0.00 creates/second
      llapi_semantic_traverse: Failed to open '/mnt/lustre/d0.sanity/d27/f27y0': No such file or directory (2)
      error: getstripe failed for /mnt/lustre/d0.sanity/d27/f27y0.
       sanity test_27y: @@@@@@ FAIL: files created on deactivated OSTs instead of degraded OST 
      Dumping lctl log to /home/yujian/test_logs/1313465731/sanity.test_27y.*.1313466087.log
      

      Maloo report: https://maloo.whamcloud.com/test_sets/124d3d8a-c7c5-11e0-8d02-52540025f9af

      Attachments

        Issue Links

          Activity

            [LU-594] 1.8<->2.1 interop: sanity test_27y: FAIL: files created on deactivated OSTs instead of degraded OST

            Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh

            Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh
            bobijam Zhenyu Xu added a comment -

            b1_8 patch tracking at http://review.whamcloud.com/1263
            master patch tracking at http://review.whamcloud.com/1262

            bobijam Zhenyu Xu added a comment - b1_8 patch tracking at http://review.whamcloud.com/1263 master patch tracking at http://review.whamcloud.com/1262
            bobijam Zhenyu Xu added a comment -

            From OST0's dmesg log

            Lustre: DEBUG MARKER: == test 27v: skip object creation on slow OST ================= == 02:28:42
            LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
            LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
            LustreError: 6122:0:(fail.c:126:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 sleeping for 50000 ms
            Lustre: DEBUG MARKER: == test 27w: check lfs setstripe -c -s -i options ============= == 02:29:27
            Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f1 -c 1 -i 0
            Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f2 -c 2 -i 1
            Lustre: DEBUG MARKER: == test 27x: create files while OST0 is degraded == 02:29:29
            Lustre: DEBUG MARKER: == test 27y: create files while OST0 is degraded and the rest inactive == 02:29:40
            LustreError: 6122:0:(fail.c:130:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 awake
            LustreError: 6121:0:(filter.c:3937:filter_precreate()) create failed rc = -28

            test27y was affected by test27v which set fail_loc to 0x705 holding OST’s object creation.

            bobijam Zhenyu Xu added a comment - From OST0's dmesg log Lustre: DEBUG MARKER: == test 27v: skip object creation on slow OST ================= == 02:28:42 LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 6122:0:(fail.c:126:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 sleeping for 50000 ms Lustre: DEBUG MARKER: == test 27w: check lfs setstripe -c -s -i options ============= == 02:29:27 Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f1 -c 1 -i 0 Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f2 -c 2 -i 1 Lustre: DEBUG MARKER: == test 27x: create files while OST0 is degraded == 02:29:29 Lustre: DEBUG MARKER: == test 27y: create files while OST0 is degraded and the rest inactive == 02:29:40 LustreError: 6122:0:(fail.c:130:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 awake LustreError: 6121:0:(filter.c:3937:filter_precreate()) create failed rc = -28 test27y was affected by test27v which set fail_loc to 0x705 holding OST’s object creation.
            yujian Jian Yu added a comment -

            The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it?

            Please look into the following report:
            https://maloo.whamcloud.com/test_sets/3e194ae0-c8b4-11e0-8d02-52540025f9af

            I used the latest master codes (Jenkins build #259).

            yujian Jian Yu added a comment - The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? Please look into the following report: https://maloo.whamcloud.com/test_sets/3e194ae0-c8b4-11e0-8d02-52540025f9af I used the latest master codes (Jenkins build #259).
            bobijam Zhenyu Xu added a comment -

            Yu Jian,

            The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? I suspect that oscc precreate on MDS got erroneous/imcompatible reply package and loop forever until there is no inode space on OST0.

            bobijam Zhenyu Xu added a comment - Yu Jian, The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? I suspect that oscc precreate on MDS got erroneous/imcompatible reply package and loop forever until there is no inode space on OST0.
            bobijam Zhenyu Xu added a comment -

            observed MDS keep on issue object create on OST0, each time the requested object id is 32 objects over what the last id on OST0, until exhausted inode space on OST0.

            00000100:00100000:4.0:1313485047.653387:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994752:12345-10.10.4.132@tcp:5
            00000010:00000001:4.0:1313485047.653389:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered
            ...
            00002000:00000001:4.0:1313485047.653401:0:19932:0:(filter.c:4018:filter_create()) Process entered
            00002000:00000002:4.0:1313485047.653402:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71331)
            00002000:00000001:4.0:1313485047.653404:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered
            00002000:00100000:4.0:1313485047.653405:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71299 -> diff = 32
            ...
            00000010:00000001:4.0:1313485047.654841:0:19932:0:(obd_class.h:802:obd_create()) Process leaving (rc=0 : 0 : 0)
            00000010:00000001:4.0:1313485047.654842:0:19932:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)
            00000010:00000001:4.0:1313485047.654843:0:19932:0:(ost_handler.c:2340:ost_handle()) Process leaving
            ...

            00000100:00100000:4.0:1313485047.655668:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994753:12345-10.10.4.132@tcp:5
            00000010:00000001:4.0:1313485047.655670:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered
            00000010:00000002:4.0:1313485047.655671:0:19932:0:(ost_handler.c:2150:ost_handle()) create
            00000010:00000001:4.0:1313485047.655672:0:19932:0:(ost_handler.c:326:ost_create()) Process entered
            ...
            00002000:00000002:4.0:1313485047.655683:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71363)
            00002000:00000001:4.0:1313485047.655685:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered
            00002000:00100000:4.0:1313485047.655686:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71331 -> diff = 32
            ...
            ...
            ...
            00000100:00100000:5.0:1313485088.784264:0:19931:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_00:lustre-mdtlov_UUID+5:5717:x1377267857012106:12345-10.10.4.132@tcp:5
            00000010:00000001:5.0:1313485088.784266:0:19931:0:(ost_handler.c:2081:ost_handle()) Process entered
            00000010:00000002:5.0:1313485088.784268:0:19931:0:(ost_handler.c:2150:ost_handle()) create
            00000010:00000001:5.0:1313485088.784269:0:19931:0:(ost_handler.c:326:ost_create()) Process entered
            ...
            00002000:00000002:5.0:1313485088.784280:0:19931:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=626403)
            00002000:00000001:5.0:1313485088.784282:0:19931:0:(filter.c:3626:filter_handle_precreate()) Process entered
            00002000:00100000:5.0:1313485088.784283:0:19931:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 626371 -> diff = 32
            ...
            00002000:00000001:5.0:1313485088.799038:0:19931:0:(filter.c:4086:filter_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
            00000010:00000001:5.0:1313485088.799040:0:19931:0:(obd_class.h:802:obd_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
            00000010:00000001:5.0:1313485088.799043:0:19931:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)

            bobijam Zhenyu Xu added a comment - observed MDS keep on issue object create on OST0, each time the requested object id is 32 objects over what the last id on OST0, until exhausted inode space on OST0. 00000100:00100000:4.0:1313485047.653387:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994752:12345-10.10.4.132@tcp:5 00000010:00000001:4.0:1313485047.653389:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered ... 00002000:00000001:4.0:1313485047.653401:0:19932:0:(filter.c:4018:filter_create()) Process entered 00002000:00000002:4.0:1313485047.653402:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71331) 00002000:00000001:4.0:1313485047.653404:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered 00002000:00100000:4.0:1313485047.653405:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71299 -> diff = 32 ... 00000010:00000001:4.0:1313485047.654841:0:19932:0:(obd_class.h:802:obd_create()) Process leaving (rc=0 : 0 : 0) 00000010:00000001:4.0:1313485047.654842:0:19932:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0) 00000010:00000001:4.0:1313485047.654843:0:19932:0:(ost_handler.c:2340:ost_handle()) Process leaving ... 00000100:00100000:4.0:1313485047.655668:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994753:12345-10.10.4.132@tcp:5 00000010:00000001:4.0:1313485047.655670:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered 00000010:00000002:4.0:1313485047.655671:0:19932:0:(ost_handler.c:2150:ost_handle()) create 00000010:00000001:4.0:1313485047.655672:0:19932:0:(ost_handler.c:326:ost_create()) Process entered ... 00002000:00000002:4.0:1313485047.655683:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71363) 00002000:00000001:4.0:1313485047.655685:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered 00002000:00100000:4.0:1313485047.655686:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71331 -> diff = 32 ... ... ... 00000100:00100000:5.0:1313485088.784264:0:19931:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_00:lustre-mdtlov_UUID+5:5717:x1377267857012106:12345-10.10.4.132@tcp:5 00000010:00000001:5.0:1313485088.784266:0:19931:0:(ost_handler.c:2081:ost_handle()) Process entered 00000010:00000002:5.0:1313485088.784268:0:19931:0:(ost_handler.c:2150:ost_handle()) create 00000010:00000001:5.0:1313485088.784269:0:19931:0:(ost_handler.c:326:ost_create()) Process entered ... 00002000:00000002:5.0:1313485088.784280:0:19931:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=626403) 00002000:00000001:5.0:1313485088.784282:0:19931:0:(filter.c:3626:filter_handle_precreate()) Process entered 00002000:00100000:5.0:1313485088.784283:0:19931:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 626371 -> diff = 32 ... 00002000:00000001:5.0:1313485088.799038:0:19931:0:(filter.c:4086:filter_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4) 00000010:00000001:5.0:1313485088.799040:0:19931:0:(obd_class.h:802:obd_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4) 00000010:00000001:5.0:1313485088.799043:0:19931:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)
            yujian Jian Yu added a comment -

            Please take a look at this report: https://maloo.whamcloud.com/test_sets/7a2423ee-c7e6-11e0-8d02-52540025f9af
            The issue could be reproduced by running sanity test 27 from subtest 27u.

            yujian Jian Yu added a comment - Please take a look at this report: https://maloo.whamcloud.com/test_sets/7a2423ee-c7e6-11e0-8d02-52540025f9af The issue could be reproduced by running sanity test 27 from subtest 27u.
            yujian Jian Yu added a comment -

            Before running sanity test 27:

            ----------------
            client-[12-13]
            ----------------
            UUID                       bytes        Used   Available Use% Mounted on
            lustre-MDT0000_UUID         3.3G      163.9M        3.0G   5% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID         9.4G      409.7M        8.5G   4% /mnt/lustre[OST:0]
            lustre-OST0001_UUID         9.4G      409.7M        8.5G   4% /mnt/lustre[OST:1]
            
            filesystem summary:        18.8G      819.3M       17.0G   4% /mnt/lustre
            
            UUID                      Inodes       IUsed       IFree IUse% Mounted on
            lustre-MDT0000_UUID       832967          28      832939   0% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID       625856          58      625798   0% /mnt/lustre[OST:0]
            lustre-OST0001_UUID       625856          56      625800   0% /mnt/lustre[OST:1]
            
            filesystem summary:       832967          28      832939   0% /mnt/lustre
            

            After running sanity test 27:

            ----------------
            client-[12-13]
            ----------------
            UUID                       bytes        Used   Available Use% Mounted on
            lustre-MDT0000_UUID         3.3G      164.1M        3.0G   5% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID         9.4G      425.0M        8.5G   5% /mnt/lustre[OST:0]
            lustre-OST0001_UUID         9.4G      409.7M        8.5G   4% /mnt/lustre[OST:1]
            
            filesystem summary:        18.8G      834.7M       17.0G   5% /mnt/lustre
            
            UUID                      Inodes       IUsed       IFree IUse% Mounted on
            lustre-MDT0000_UUID       833019         121      832898   0% /mnt/lustre[MDT:0]
            lustre-OST0000_UUID       625856      625856           0 100% /mnt/lustre[OST:0]
            lustre-OST0001_UUID       625856         114      625742   0% /mnt/lustre[OST:1]
            
            filesystem summary:       833019         121      832898   0% /mnt/lustre
            

            IFree became 0 on lustre-OST0000_UUID.

            yujian Jian Yu added a comment - Before running sanity test 27: ---------------- client-[12-13] ---------------- UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 3.3G 163.9M 3.0G 5% /mnt/lustre[MDT:0] lustre-OST0000_UUID 9.4G 409.7M 8.5G 4% /mnt/lustre[OST:0] lustre-OST0001_UUID 9.4G 409.7M 8.5G 4% /mnt/lustre[OST:1] filesystem summary: 18.8G 819.3M 17.0G 4% /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 832967 28 832939 0% /mnt/lustre[MDT:0] lustre-OST0000_UUID 625856 58 625798 0% /mnt/lustre[OST:0] lustre-OST0001_UUID 625856 56 625800 0% /mnt/lustre[OST:1] filesystem summary: 832967 28 832939 0% /mnt/lustre After running sanity test 27: ---------------- client-[12-13] ---------------- UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 3.3G 164.1M 3.0G 5% /mnt/lustre[MDT:0] lustre-OST0000_UUID 9.4G 425.0M 8.5G 5% /mnt/lustre[OST:0] lustre-OST0001_UUID 9.4G 409.7M 8.5G 4% /mnt/lustre[OST:1] filesystem summary: 18.8G 834.7M 17.0G 5% /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 833019 121 832898 0% /mnt/lustre[MDT:0] lustre-OST0000_UUID 625856 625856 0 100% /mnt/lustre[OST:0] lustre-OST0001_UUID 625856 114 625742 0% /mnt/lustre[OST:1] filesystem summary: 833019 121 832898 0% /mnt/lustre IFree became 0 on lustre-OST0000_UUID.
            bobijam Zhenyu Xu added a comment - - edited
            OST1 debug log

            00002000:00020000:15.0:1313466086.875951:0:11221:0:(filter.c:3937:filter_precreate()) create failed rc = -28
            00002000:00000004:15.0:1313466086.882940:0:11221:0:(lustre_fsfilt.h:401:fsfilt_statfs()) osfs 4295191894, max_age 4295193783
            00002000:00020000:15.0:1313466086.882948:0:11221:0:(filter.c:3942:filter_precreate()) lustre-OST0000: free inode 0

            OST1 have no free files to create object which causes the error.

            bobijam Zhenyu Xu added a comment - - edited OST1 debug log 00002000:00020000:15.0:1313466086.875951:0:11221:0:(filter.c:3937:filter_precreate()) create failed rc = -28 00002000:00000004:15.0:1313466086.882940:0:11221:0:(lustre_fsfilt.h:401:fsfilt_statfs()) osfs 4295191894, max_age 4295193783 00002000:00020000:15.0:1313466086.882948:0:11221:0:(filter.c:3942:filter_precreate()) lustre-OST0000: free inode 0 OST1 have no free files to create object which causes the error.

            People

              bobijam Zhenyu Xu
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: