[LU-594] 1.8<->2.1 interop: sanity test_27y: FAIL: files created on deactivated OSTs instead of degraded OST Created: 16/Aug/11  Updated: 29/Aug/11  Resolved: 29/Aug/11

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0, Lustre 1.8.6
Fix Version/s: Lustre 2.1.0, Lustre 1.8.7

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None
Environment:

Old Lustre Version: 1.8.6-wc1
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/100/

New Lustre Version: master
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/257/

Rolling upgrading (Lustre servers and clients were upgraded one by one without unmounting others) from Lustre 1.8.6-wc1 to Lustre master under the following configuration:

OSS1: RHEL5/x86_64
OSS2: RHEL5/x86_64
MDS: RHEL5/x86_64
Client1: RHEL6/x86_64
Client2: RHEL5/x86_64


Severity: 3
Rank (Obsolete): 4080

 Description   

After upgrading OSS1 (fat-amd-2), running sanity test 27y on Lustre 1.8.6-wc1 clients (client-[12,13]) failed as follows:

== test 27y: create files while OST0 is degraded and the rest inactive == 20:40:14
lustre-OST0001-osc is Deactivate:
open(/mnt/lustre/d0.sanity/d27/f27y0) error: No space left on device
total: 0 creates in 61.40 seconds: 0.00 creates/second
llapi_semantic_traverse: Failed to open '/mnt/lustre/d0.sanity/d27/f27y0': No such file or directory (2)
error: getstripe failed for /mnt/lustre/d0.sanity/d27/f27y0.
 sanity test_27y: @@@@@@ FAIL: files created on deactivated OSTs instead of degraded OST 
Dumping lctl log to /home/yujian/test_logs/1313465731/sanity.test_27y.*.1313466087.log

Maloo report: https://maloo.whamcloud.com/test_sets/124d3d8a-c7c5-11e0-8d02-52540025f9af



 Comments   
Comment by Zhenyu Xu [ 16/Aug/11 ]
OST1 debug log

00002000:00020000:15.0:1313466086.875951:0:11221:0:(filter.c:3937:filter_precreate()) create failed rc = -28
00002000:00000004:15.0:1313466086.882940:0:11221:0:(lustre_fsfilt.h:401:fsfilt_statfs()) osfs 4295191894, max_age 4295193783
00002000:00020000:15.0:1313466086.882948:0:11221:0:(filter.c:3942:filter_precreate()) lustre-OST0000: free inode 0

OST1 have no free files to create object which causes the error.

Comment by Jian Yu [ 16/Aug/11 ]

Before running sanity test 27:

----------------
client-[12-13]
----------------
UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID         3.3G      163.9M        3.0G   5% /mnt/lustre[MDT:0]
lustre-OST0000_UUID         9.4G      409.7M        8.5G   4% /mnt/lustre[OST:0]
lustre-OST0001_UUID         9.4G      409.7M        8.5G   4% /mnt/lustre[OST:1]

filesystem summary:        18.8G      819.3M       17.0G   4% /mnt/lustre

UUID                      Inodes       IUsed       IFree IUse% Mounted on
lustre-MDT0000_UUID       832967          28      832939   0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID       625856          58      625798   0% /mnt/lustre[OST:0]
lustre-OST0001_UUID       625856          56      625800   0% /mnt/lustre[OST:1]

filesystem summary:       832967          28      832939   0% /mnt/lustre

After running sanity test 27:

----------------
client-[12-13]
----------------
UUID                       bytes        Used   Available Use% Mounted on
lustre-MDT0000_UUID         3.3G      164.1M        3.0G   5% /mnt/lustre[MDT:0]
lustre-OST0000_UUID         9.4G      425.0M        8.5G   5% /mnt/lustre[OST:0]
lustre-OST0001_UUID         9.4G      409.7M        8.5G   4% /mnt/lustre[OST:1]

filesystem summary:        18.8G      834.7M       17.0G   5% /mnt/lustre

UUID                      Inodes       IUsed       IFree IUse% Mounted on
lustre-MDT0000_UUID       833019         121      832898   0% /mnt/lustre[MDT:0]
lustre-OST0000_UUID       625856      625856           0 100% /mnt/lustre[OST:0]
lustre-OST0001_UUID       625856         114      625742   0% /mnt/lustre[OST:1]

filesystem summary:       833019         121      832898   0% /mnt/lustre

IFree became 0 on lustre-OST0000_UUID.

Comment by Jian Yu [ 16/Aug/11 ]

Please take a look at this report: https://maloo.whamcloud.com/test_sets/7a2423ee-c7e6-11e0-8d02-52540025f9af
The issue could be reproduced by running sanity test 27 from subtest 27u.

Comment by Zhenyu Xu [ 16/Aug/11 ]

observed MDS keep on issue object create on OST0, each time the requested object id is 32 objects over what the last id on OST0, until exhausted inode space on OST0.

00000100:00100000:4.0:1313485047.653387:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994752:12345-10.10.4.132@tcp:5
00000010:00000001:4.0:1313485047.653389:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered
...
00002000:00000001:4.0:1313485047.653401:0:19932:0:(filter.c:4018:filter_create()) Process entered
00002000:00000002:4.0:1313485047.653402:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71331)
00002000:00000001:4.0:1313485047.653404:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered
00002000:00100000:4.0:1313485047.653405:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71299 -> diff = 32
...
00000010:00000001:4.0:1313485047.654841:0:19932:0:(obd_class.h:802:obd_create()) Process leaving (rc=0 : 0 : 0)
00000010:00000001:4.0:1313485047.654842:0:19932:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)
00000010:00000001:4.0:1313485047.654843:0:19932:0:(ost_handler.c:2340:ost_handle()) Process leaving
...

00000100:00100000:4.0:1313485047.655668:0:19932:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_01:lustre-mdtlov_UUID+5:5717:x1377267856994753:12345-10.10.4.132@tcp:5
00000010:00000001:4.0:1313485047.655670:0:19932:0:(ost_handler.c:2081:ost_handle()) Process entered
00000010:00000002:4.0:1313485047.655671:0:19932:0:(ost_handler.c:2150:ost_handle()) create
00000010:00000001:4.0:1313485047.655672:0:19932:0:(ost_handler.c:326:ost_create()) Process entered
...
00002000:00000002:4.0:1313485047.655683:0:19932:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=71363)
00002000:00000001:4.0:1313485047.655685:0:19932:0:(filter.c:3626:filter_handle_precreate()) Process entered
00002000:00100000:4.0:1313485047.655686:0:19932:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 71331 -> diff = 32
...
...
...
00000100:00100000:5.0:1313485088.784264:0:19931:0:(service.c:1705:ptlrpc_server_handle_request()) Handling RPC pname:cluuid+ref:pid:xid:nid:opc ll_ost_creat_00:lustre-mdtlov_UUID+5:5717:x1377267857012106:12345-10.10.4.132@tcp:5
00000010:00000001:5.0:1313485088.784266:0:19931:0:(ost_handler.c:2081:ost_handle()) Process entered
00000010:00000002:5.0:1313485088.784268:0:19931:0:(ost_handler.c:2150:ost_handle()) create
00000010:00000001:5.0:1313485088.784269:0:19931:0:(ost_handler.c:326:ost_create()) Process entered
...
00002000:00000002:5.0:1313485088.784280:0:19931:0:(filter.c:4021:filter_create()) lustre-OST0000: filter_create(group=0,id=626403)
00002000:00000001:5.0:1313485088.784282:0:19931:0:(filter.c:3626:filter_handle_precreate()) Process entered
00002000:00100000:5.0:1313485088.784283:0:19931:0:(filter.c:3680:filter_handle_precreate()) filter_last_id() = 626371 -> diff = 32
...
00002000:00000001:5.0:1313485088.799038:0:19931:0:(filter.c:4086:filter_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
00000010:00000001:5.0:1313485088.799040:0:19931:0:(obd_class.h:802:obd_create()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
00000010:00000001:5.0:1313485088.799043:0:19931:0:(ost_handler.c:346:ost_create()) Process leaving (rc=0 : 0 : 0)

Comment by Zhenyu Xu [ 16/Aug/11 ]

Yu Jian,

The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it? I suspect that oscc precreate on MDS got erroneous/imcompatible reply package and loop forever until there is no inode space on OST0.

Comment by Jian Yu [ 17/Aug/11 ]

The MDS log shows little useful info, would you mind set MDS debug level as -1 and try to reproduce it?

Please look into the following report:
https://maloo.whamcloud.com/test_sets/3e194ae0-c8b4-11e0-8d02-52540025f9af

I used the latest master codes (Jenkins build #259).

Comment by Zhenyu Xu [ 18/Aug/11 ]

From OST0's dmesg log

Lustre: DEBUG MARKER: == test 27v: skip object creation on slow OST ================= == 02:28:42
LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
LustreError: 6122:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
LustreError: 6122:0:(fail.c:126:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 sleeping for 50000 ms
Lustre: DEBUG MARKER: == test 27w: check lfs setstripe -c -s -i options ============= == 02:29:27
Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f1 -c 1 -i 0
Lustre: DEBUG MARKER: setstripe /mnt/lustre/d0.sanity/d27/f2 -c 2 -i 1
Lustre: DEBUG MARKER: == test 27x: create files while OST0 is degraded == 02:29:29
Lustre: DEBUG MARKER: == test 27y: create files while OST0 is degraded and the rest inactive == 02:29:40
LustreError: 6122:0:(fail.c:130:__cfs_fail_timeout_set()) cfs_fail_timeout id 705 awake
LustreError: 6121:0:(filter.c:3937:filter_precreate()) create failed rc = -28

test27y was affected by test27v which set fail_loc to 0x705 holding OST’s object creation.

Comment by Zhenyu Xu [ 18/Aug/11 ]

b1_8 patch tracking at http://review.whamcloud.com/1263
master patch tracking at http://review.whamcloud.com/1262

Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » x86_64,client,el6,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » i686,client,el5,ofa #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » i686,client,el5,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » x86_64,client,el5,ofa #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » i686,client,el6,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » x86_64,client,ubuntu1004,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » x86_64,server,el5,ofa #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » i686,server,el5,inkernel #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-b1_8 » i686,server,el5,ofa #123
LU-594 Make sure sanity 27v fallout clears

Johann Lombardi : b113af75053c721c2540b1e4cac28599ddf84e22
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,client,el5,ofa #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Comment by Build Master (Inactive) [ 29/Aug/11 ]

Integrated in lustre-master » i686,server,el5,ofa #274
LU-594 Make sure sanity 27v fallout clears

Oleg Drokin : ce95c918eb48bdb5fb910f2d75062fbca27ddc47
Files :

  • lustre/tests/sanity.sh
Generated at Sat Feb 10 01:08:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.