Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-594

1.8<->2.1 interop: sanity test_27y: FAIL: files created on deactivated OSTs instead of degraded OST

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.0, Lustre 1.8.7
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 3
    • 4080

    Description

      After upgrading OSS1 (fat-amd-2), running sanity test 27y on Lustre 1.8.6-wc1 clients (client-[12,13]) failed as follows:

      == test 27y: create files while OST0 is degraded and the rest inactive == 20:40:14
      lustre-OST0001-osc is Deactivate:
      open(/mnt/lustre/d0.sanity/d27/f27y0) error: No space left on device
      total: 0 creates in 61.40 seconds: 0.00 creates/second
      llapi_semantic_traverse: Failed to open '/mnt/lustre/d0.sanity/d27/f27y0': No such file or directory (2)
      error: getstripe failed for /mnt/lustre/d0.sanity/d27/f27y0.
       sanity test_27y: @@@@@@ FAIL: files created on deactivated OSTs instead of degraded OST 
      Dumping lctl log to /home/yujian/test_logs/1313465731/sanity.test_27y.*.1313466087.log
      

      Maloo report: https://maloo.whamcloud.com/test_sets/124d3d8a-c7c5-11e0-8d02-52540025f9af

      Attachments

        Issue Links

          Activity

            [LU-594] 1.8<->2.1 interop: sanity test_27y: FAIL: files created on deactivated OSTs instead of degraded OST

            Integrated in lustre-b1_8 » i686,client,el5,inkernel #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el5,inkernel #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh

            Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh

            Integrated in lustre-b1_8 » i686,client,el5,ofa #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » i686,client,el5,ofa #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh

            Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh

            Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #123
            LU-594 Make sure sanity 27v fallout clears

            Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
            Files :

            • lustre/tests/sanity.sh
            hudson Build Master (Inactive) added a comment - Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #123 LU-594 Make sure sanity 27v fallout clears Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22 Files : lustre/tests/sanity.sh
            bobijam Zhenyu Xu added a comment -

            b1_8 patch tracking at http://review.whamcloud.com/1263
            master patch tracking at http://review.whamcloud.com/1262

            bobijam Zhenyu Xu added a comment - b1_8 patch tracking at http://review.whamcloud.com/1263 master patch tracking at http://review.whamcloud.com/1262
            bobijam Zhenyu Xu added a comment -

            From OST0's dmesg log

            Lustre: DEBUG MARKER: == test 27v: skip object creation on slow OST ================= == 02:28:42
            LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
            LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
            LustreError: 6122:0:(fail.c:126:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 sleeping for 50000 ms
            Lustre: DEBUG MARKER: == test 27w: check lfs setstripe -c -s -i options ============= == 02:29:27
            Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f1 -c 1 -i 0
            Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f2 -c 2 -i 1
            Lustre: DEBUG MARKER: == test 27x: create files while OST0 is degraded == 02:29:29
            Lustre: DEBUG MARKER: == test 27y: create files while OST0 is degraded and the rest inactive == 02:29:40
            LustreError: 6122:0:(fail.c:130:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 awake
            LustreError: 6121:0:(filter.c:3937:filter_precreate()) create failed rc = -28

            test27y was affected by test27v which set fail_loc to 0x705 holding OST’s object creation.

            bobijam Zhenyu Xu added a comment - From OST0's dmesg log Lustre: DEBUG MARKER: == test 27v: skip object creation on slow OST ================= == 02:28:42 LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 6122:0:(fail.c:126:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 sleeping for 50000 ms Lustre: DEBUG MARKER: == test 27w: check lfs setstripe -c -s -i options ============= == 02:29:27 Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f1 -c 1 -i 0 Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f2 -c 2 -i 1 Lustre: DEBUG MARKER: == test 27x: create files while OST0 is degraded == 02:29:29 Lustre: DEBUG MARKER: == test 27y: create files while OST0 is degraded and the rest inactive == 02:29:40 LustreError: 6122:0:(fail.c:130:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 awake LustreError: 6121:0:(filter.c:3937:filter_precreate()) create failed rc = -28 test27y was affected by test27v which set fail_loc to 0x705 holding OST’s object creation.
            yujian Jian Yu added a comment -

            The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it?

            Please look into the following report:
            https://maloo.whamcloud.com/test_sets/3e194ae0-c8b4-11e0-8d02-52540025f9af

            I used the latest master codes (Jenkins build #259).

            yujian Jian Yu added a comment - The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? Please look into the following report: https://maloo.whamcloud.com/test_sets/3e194ae0-c8b4-11e0-8d02-52540025f9af I used the latest master codes (Jenkins build #259).
            bobijam Zhenyu Xu added a comment -

            Yu Jian,

            The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? I suspect that oscc precreate on MDS got erroneous/imcompatible reply package and loop forever until there is no inode space on OST0.

            bobijam Zhenyu Xu added a comment - Yu Jian, The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? I suspect that oscc precreate on MDS got erroneous/imcompatible reply package and loop forever until there is no inode space on OST0.
            bobijam Zhenyu Xu added a comment -

            observed MDS keep on issue object create on OST0, each time the requested object id is 32 objects over what the last id on OST0, until exhausted inode space on OST0.

            00000100:00100000:4.0:1313485047.653387:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994752:12345-10.10.4.132@tcp:5
            00000010:00000001:4.0:1313485047.653389:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered
            ...
            00002000:00000001:4.0:1313485047.653401:0:19932:0:(filter.c:4018:filter_create()) Process entered
            00002000:00000002:4.0:1313485047.653402:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71331)
            00002000:00000001:4.0:1313485047.653404:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered
            00002000:00100000:4.0:1313485047.653405:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71299 -> diff = 32
            ...
            00000010:00000001:4.0:1313485047.654841:0:19932:0:(obd_class.h:802:obd_create()) Process leaving (rc=0 : 0 : 0)
            00000010:00000001:4.0:1313485047.654842:0:19932:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)
            00000010:00000001:4.0:1313485047.654843:0:19932:0:(ost_handler.c:2340:ost_handle()) Process leaving
            ...

            00000100:00100000:4.0:1313485047.655668:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994753:12345-10.10.4.132@tcp:5
            00000010:00000001:4.0:1313485047.655670:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered
            00000010:00000002:4.0:1313485047.655671:0:19932:0:(ost_handler.c:2150:ost_handle()) create
            00000010:00000001:4.0:1313485047.655672:0:19932:0:(ost_handler.c:326:ost_create()) Process entered
            ...
            00002000:00000002:4.0:1313485047.655683:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71363)
            00002000:00000001:4.0:1313485047.655685:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered
            00002000:00100000:4.0:1313485047.655686:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71331 -> diff = 32
            ...
            ...
            ...
            00000100:00100000:5.0:1313485088.784264:0:19931:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_00:lustre-mdtlov_UUID+5:5717:x1377267857012106:12345-10.10.4.132@tcp:5
            00000010:00000001:5.0:1313485088.784266:0:19931:0:(ost_handler.c:2081:ost_handle()) Process entered
            00000010:00000002:5.0:1313485088.784268:0:19931:0:(ost_handler.c:2150:ost_handle()) create
            00000010:00000001:5.0:1313485088.784269:0:19931:0:(ost_handler.c:326:ost_create()) Process entered
            ...
            00002000:00000002:5.0:1313485088.784280:0:19931:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=626403)
            00002000:00000001:5.0:1313485088.784282:0:19931:0:(filter.c:3626:filter_handle_precreate()) Process entered
            00002000:00100000:5.0:1313485088.784283:0:19931:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 626371 -> diff = 32
            ...
            00002000:00000001:5.0:1313485088.799038:0:19931:0:(filter.c:4086:filter_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
            00000010:00000001:5.0:1313485088.799040:0:19931:0:(obd_class.h:802:obd_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
            00000010:00000001:5.0:1313485088.799043:0:19931:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)

            bobijam Zhenyu Xu added a comment - observed MDS keep on issue object create on OST0, each time the requested object id is 32 objects over what the last id on OST0, until exhausted inode space on OST0. 00000100:00100000:4.0:1313485047.653387:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994752:12345-10.10.4.132@tcp:5 00000010:00000001:4.0:1313485047.653389:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered ... 00002000:00000001:4.0:1313485047.653401:0:19932:0:(filter.c:4018:filter_create()) Process entered 00002000:00000002:4.0:1313485047.653402:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71331) 00002000:00000001:4.0:1313485047.653404:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered 00002000:00100000:4.0:1313485047.653405:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71299 -> diff = 32 ... 00000010:00000001:4.0:1313485047.654841:0:19932:0:(obd_class.h:802:obd_create()) Process leaving (rc=0 : 0 : 0) 00000010:00000001:4.0:1313485047.654842:0:19932:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0) 00000010:00000001:4.0:1313485047.654843:0:19932:0:(ost_handler.c:2340:ost_handle()) Process leaving ... 00000100:00100000:4.0:1313485047.655668:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994753:12345-10.10.4.132@tcp:5 00000010:00000001:4.0:1313485047.655670:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered 00000010:00000002:4.0:1313485047.655671:0:19932:0:(ost_handler.c:2150:ost_handle()) create 00000010:00000001:4.0:1313485047.655672:0:19932:0:(ost_handler.c:326:ost_create()) Process entered ... 00002000:00000002:4.0:1313485047.655683:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71363) 00002000:00000001:4.0:1313485047.655685:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered 00002000:00100000:4.0:1313485047.655686:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71331 -> diff = 32 ... ... ... 00000100:00100000:5.0:1313485088.784264:0:19931:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_00:lustre-mdtlov_UUID+5:5717:x1377267857012106:12345-10.10.4.132@tcp:5 00000010:00000001:5.0:1313485088.784266:0:19931:0:(ost_handler.c:2081:ost_handle()) Process entered 00000010:00000002:5.0:1313485088.784268:0:19931:0:(ost_handler.c:2150:ost_handle()) create 00000010:00000001:5.0:1313485088.784269:0:19931:0:(ost_handler.c:326:ost_create()) Process entered ... 00002000:00000002:5.0:1313485088.784280:0:19931:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=626403) 00002000:00000001:5.0:1313485088.784282:0:19931:0:(filter.c:3626:filter_handle_precreate()) Process entered 00002000:00100000:5.0:1313485088.784283:0:19931:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 626371 -> diff = 32 ... 00002000:00000001:5.0:1313485088.799038:0:19931:0:(filter.c:4086:filter_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4) 00000010:00000001:5.0:1313485088.799040:0:19931:0:(obd_class.h:802:obd_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4) 00000010:00000001:5.0:1313485088.799043:0:19931:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)
            yujian Jian Yu added a comment -

            Please take a look at this report: https://maloo.whamcloud.com/test_sets/7a2423ee-c7e6-11e0-8d02-52540025f9af
            The issue could be reproduced by running sanity test 27 from subtest 27u.

            yujian Jian Yu added a comment - Please take a look at this report: https://maloo.whamcloud.com/test_sets/7a2423ee-c7e6-11e0-8d02-52540025f9af The issue could be reproduced by running sanity test 27 from subtest 27u.

            People

              bobijam Zhenyu Xu
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: