[LU-3645] Interop 2.1.5 <--> 2.4 Write operations during failover errors out instead of stalling - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.4.0, Lustre 2.1.5
Labels:
- mn1

Severity:
3
Rank (Obsolete):
9382

Description

During acceptance testing for KIT, they tried OSS failover while running several applications. And applications got IO errors (can't create file and similar messages). This should not happen and IO should just stall till failover happens.
The clients were running 2.4 and servers were 2.1.5. We tried with 2.1.5 clients and did not see this issue. I have attached the client and server logs.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

client_lctl_dk_20130911.tgz
18 kB
13/Sep/13 2:31 PM
client_messages_20130911.tgz
228 kB
13/Sep/13 2:31 PM
ll10987.out.gz
0.2 kB
23/Oct/13 9:05 PM
LU-XXXX.tgz
463 kB
26/Jul/13 8:26 AM
mds1.llog.gz
224 kB
30/Jul/13 5:27 PM
mds2.llog.gz
0.2 kB
30/Jul/13 5:27 PM
server_lctl_dk_20130911.tgz
392 kB
13/Sep/13 2:31 PM
ucbn003.localdomain.llog.gz
0.2 kB
30/Jul/13 5:27 PM

Activity

[LU-3645] Interop 2.1.5 <--> 2.4 Write operations during failover errors out instead of stalling

Kit Westneat (Inactive) added a comment - 09/Oct/13 4:41 PM

The fix that worked on my test system didn't work on the customer system, so it might have been a fluke.

I haven't seen that message in the new logs. Here is the latest server dk's:

00000100:00100000:14.0:1381243911.199943:0:4598:0:(import.c:725:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5)  req@ffff8808181fb400 x1448249214211642/t0(
0) o8->pfscdat2-OST0000-osc-MDT0000@172.26.17.3@o2ib:28/4 lens 368/512 e 0 to 0 dl 0 ref 1 fl New:N/0/ffffffff rc 0/-1
00000100:00100000:13.0:1381243911.212619:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1:
1448249214211637:172.26.17.3@o2ib:-1
00020000:00020000:13.0:1381243911.212624:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -11
00000100:00100000:13.0:1381243911.225308:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1:
1448249214211638:172.26.17.3@o2ib:-1
00020000:00020000:13.0:1381243911.225312:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -11
00000100:00100000:13.0:1381243911.250579:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1:
1448249214211639:172.26.17.3@o2ib:-1
00020000:00020000:6.0:1381243911.250579:0:29944:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -5
00000100:00100000:13.0:1381243911.250586:0:4596:0:(client.c:1434:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:e76ce10d-1d27-cdc4-1091-649094c70331:4596:1448249214211641:172.26.17.1@o2ib:400
00020000:00020000:5.0:1381243911.250612:0:4755:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -5
00000100:00100000:13.0:1381243911.250638:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:0:1448249214211644:172.26.17.3@o2ib:13
00000100:00100000:5.0:1381243911.250741:0:4755:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_02:94b08aa2-54d7-b32e-019d-2561ed3286b5+6:58211:x1447047461344560:12345-172.26.4.3@o2ib:101 Request procesed in 49643595us (49643620us total) trans 0 rc 301/301

Kit Westneat (Inactive) added a comment - 09/Oct/13 4:41 PM The fix that worked on my test system didn't work on the customer system, so it might have been a fluke. I haven't seen that message in the new logs. Here is the latest server dk's: 00000100:00100000:14.0:1381243911.199943:0:4598:0:( import .c:725:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5) req@ffff8808181fb400 x1448249214211642/t0( 0) o8->pfscdat2-OST0000-osc-MDT0000@172.26.17.3@o2ib:28/4 lens 368/512 e 0 to 0 dl 0 ref 1 fl New:N/0/ffffffff rc 0/-1 00000100:00100000:13.0:1381243911.212619:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1: 1448249214211637:172.26.17.3@o2ib:-1 00020000:00020000:13.0:1381243911.212624:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -11 00000100:00100000:13.0:1381243911.225308:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1: 1448249214211638:172.26.17.3@o2ib:-1 00020000:00020000:13.0:1381243911.225312:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -11 00000100:00100000:13.0:1381243911.250579:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1: 1448249214211639:172.26.17.3@o2ib:-1 00020000:00020000:6.0:1381243911.250579:0:29944:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -5 00000100:00100000:13.0:1381243911.250586:0:4596:0:(client.c:1434:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:e76ce10d-1d27-cdc4-1091-649094c70331:4596:1448249214211641:172.26.17.1@o2ib:400 00020000:00020000:5.0:1381243911.250612:0:4755:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -5 00000100:00100000:13.0:1381243911.250638:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:0:1448249214211644:172.26.17.3@o2ib:13 00000100:00100000:5.0:1381243911.250741:0:4755:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_02:94b08aa2-54d7-b32e-019d-2561ed3286b5+6:58211:x1447047461344560:12345-172.26.4.3@o2ib:101 Request procesed in 49643595us (49643620us total) trans 0 rc 301/301

Hongchao Zhang added a comment - 08/Oct/13 2:35 PM

Hi Kit,

Sorry for delayed response, we're in holiday in the last several days.

if "OSCC_FLAG_DEGRADED" is set, then this OST won't be used preferentially when creating object. do you only test the failover on one OSS?

BTW, is there something liken "Unknown rc XXX from async create: failing oscc" in the kernel logs?

Thanks

Hongchao Zhang added a comment - 08/Oct/13 2:35 PM Hi Kit, Sorry for delayed response, we're in holiday in the last several days. if "OSCC_FLAG_DEGRADED" is set, then this OST won't be used preferentially when creating object. do you only test the failover on one OSS? BTW, is there something liken "Unknown rc XXX from async create: failing oscc" in the kernel logs? Thanks

Kit Westneat (Inactive) added a comment - 03/Oct/13 8:49 PM

I tested adding the degraded flag to the oscc and it seemed to fix the problem on my test system:

        if(rc != 0 && rc != -ENOTCONN && rc != -ENODEV)
                GOTO(out_wake, rc);
           
+        if (rc == -ENOTCONN || rc ==  -ENODEV)
+                oscc->oscc_flags |= OSCC_FLAG_DEGRADED;

        /* Handle the critical type errors first.

I ran into one issue where the OST ran out of inodes and it caused the lov_create threads on the MDT to deadlock. I'm not sure if that's a new behavior or in stock 2.1.6 as well. I'll investigate.

Kit Westneat (Inactive) added a comment - 03/Oct/13 8:49 PM I tested adding the degraded flag to the oscc and it seemed to fix the problem on my test system: if (rc != 0 && rc != -ENOTCONN && rc != -ENODEV) GOTO(out_wake, rc); + if (rc == -ENOTCONN || rc == -ENODEV) + oscc->oscc_flags |= OSCC_FLAG_DEGRADED; /* Handle the critical type errors first. I ran into one issue where the OST ran out of inodes and it caused the lov_create threads on the MDT to deadlock. I'm not sure if that's a new behavior or in stock 2.1.6 as well. I'll investigate.

Kit Westneat (Inactive) added a comment - 03/Oct/13 2:30 PM

Hi Hongchao,

Any updates?

Thanks,
Kit

Kit Westneat (Inactive) added a comment - 03/Oct/13 2:30 PM Hi Hongchao, Any updates? Thanks, Kit

Kit Westneat (Inactive) added a comment - 01/Oct/13 3:47 PM

Ah ok, that makes sense. Should handle_async_create add the RECOVERING flag to the oscc? It seems like osc_interpret_create would add it, but is it possible that it could get to oscc_internal_create without that flag?

Kit Westneat (Inactive) added a comment - 01/Oct/13 3:47 PM Ah ok, that makes sense. Should handle_async_create add the RECOVERING flag to the oscc? It seems like osc_interpret_create would add it, but is it possible that it could get to oscc_internal_create without that flag?

Hongchao Zhang added a comment - 30/Sep/13 4:22 PM

Hi
the fake requests will be queued into osc_creater.oscc_wait_create_list in osc_create_async to wait the objects creation to complete, and it will be woken up
in osc_interpret_create with returned value from OST. in this case, -ENOTCONN or -ENODEV is returned and failed these fake requests, which should continue
waiting the recovery of OST to complete and the objects will be ready then.

I'll create a new debug patch with the http://review.whamcloud.com/#/c/7559/

Thanks!

Hongchao Zhang added a comment - 30/Sep/13 4:22 PM Hi the fake requests will be queued into osc_creater.oscc_wait_create_list in osc_create_async to wait the objects creation to complete, and it will be woken up in osc_interpret_create with returned value from OST. in this case, -ENOTCONN or -ENODEV is returned and failed these fake requests, which should continue waiting the recovery of OST to complete and the objects will be ready then. I'll create a new debug patch with the http://review.whamcloud.com/#/c/7559/ Thanks!

Kit Westneat (Inactive) added a comment - 27/Sep/13 5:59 PM

Hi Hongchao,

I was trying to follow the create logic some and it's confused me some. It seems like the fake requests are handled by oscc_internal_create, but then if the oscc is recovering (due to the EAGAIN) all it does is return 0. Am I reading that right?

Thanks,
Kit

Kit Westneat (Inactive) added a comment - 27/Sep/13 5:59 PM Hi Hongchao, I was trying to follow the create logic some and it's confused me some. It seems like the fake requests are handled by oscc_internal_create, but then if the oscc is recovering (due to the EAGAIN) all it does is return 0. Am I reading that right? Thanks, Kit

Kit Westneat (Inactive) added a comment - 26/Sep/13 5:18 PM

Hi Hongchao,

The patch fixed the data corruption, but we are still seeing the same errors on failover. Should we reproduce the issue with the original debugging patch + fix? Here are the kernel messages from the fix (patch 7559):

Sep 25 15:16:49 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The ost_setattr operation failed with -19
Sep 25 15:16:49 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete
Sep 25 15:17:39 pfscn2 kernel: : LustreError: 9001:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -11
Sep 25 15:17:39 pfscn2 kernel: : LustreError: 7308:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -5
Sep 25 15:17:40 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib)
Sep 25 15:17:40 pfscn2 kernel: : Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans

Kit Westneat (Inactive) added a comment - 26/Sep/13 5:18 PM Hi Hongchao, The patch fixed the data corruption, but we are still seeing the same errors on failover. Should we reproduce the issue with the original debugging patch + fix? Here are the kernel messages from the fix (patch 7559): Sep 25 15:16:49 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The ost_setattr operation failed with -19 Sep 25 15:16:49 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 25 15:17:39 pfscn2 kernel: : LustreError: 9001:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -11 Sep 25 15:17:39 pfscn2 kernel: : LustreError: 7308:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -5 Sep 25 15:17:40 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib) Sep 25 15:17:40 pfscn2 kernel: : Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans

Hongchao Zhang added a comment - 16/Sep/13 11:10 AM

sorry, there is an error in the previous patch, and it has been updated (http://review.whamcloud.com/#/c/7559/)

Hongchao Zhang added a comment - 16/Sep/13 11:10 AM sorry, there is an error in the previous patch, and it has been updated ( http://review.whamcloud.com/#/c/7559/ )

Kit Westneat (Inactive) added a comment - 13/Sep/13 2:31 PM

I should add that we didn't see the problem any more after downgrading to stock 2.1.5. We did however see the issue at both 2.1.6+patch and 2.1.5+patch.

Kit Westneat (Inactive) added a comment - 13/Sep/13 2:31 PM I should add that we didn't see the problem any more after downgrading to stock 2.1.5. We did however see the issue at both 2.1.6+patch and 2.1.5+patch.

Kit Westneat (Inactive) added a comment - 13/Sep/13 2:30 PM

Hi Hongchao,

We've run into a problem with this patch. After upgrading the MDT, we are starting to get data corruption on the system. Here is a description of the issue:

Another admin has been working on the test file system this morning.
First, she could execute normal commands on the file system. Then
normal commands (like vi) added 112 MB rubbish to files. Even a touch
on a non-existing file created a file of 112 MB.

Another admin logged in and his .Xauthority was increased. He looked
at the binary data ant it seemed like the additional data came from a
software package which is installed on the same Lustre file system,
i.e. the rubbish seems to be no arbitrary data but seems to come from
another location.

root@iccn999:/software/all/tsm/sbin# touch gaga1
Wed Sep 11-14:38:21 (14/1012) - ACTIVE
root@iccn999:/software/all/tsm/sbin# ls -l gaga1
~~rw-r~~r- 1 root root 116430464 Sep 11 14:38 gaga1

On another client the behaviour is different:
root@iccn996:/software/all/tsm/sbin# touch gaga2
touch: setting times of `gaga2': No such file or directory
Wed Sep 11-14:39:15 (5/41)
root@iccn996:/software/all/tsm/sbin# ls -l gaga2
~~rw-r~~r- 1 root root 0 Sep 11 14:39 gaga2

I will upload the lctl dk logs (with vfstrace and rpctrace). Is there any other information we should get?

Kit Westneat (Inactive) added a comment - 13/Sep/13 2:30 PM Hi Hongchao, We've run into a problem with this patch. After upgrading the MDT, we are starting to get data corruption on the system. Here is a description of the issue: Another admin has been working on the test file system this morning. First, she could execute normal commands on the file system. Then normal commands (like vi) added 112 MB rubbish to files. Even a touch on a non-existing file created a file of 112 MB. Another admin logged in and his .Xauthority was increased. He looked at the binary data ant it seemed like the additional data came from a software package which is installed on the same Lustre file system, i.e. the rubbish seems to be no arbitrary data but seems to come from another location. root@iccn999:/software/all/tsm/sbin# touch gaga1 Wed Sep 11-14:38:21 (14/1012) - ACTIVE root@iccn999:/software/all/tsm/sbin# ls -l gaga1 rw-r r - 1 root root 116430464 Sep 11 14:38 gaga1 On another client the behaviour is different: root@iccn996:/software/all/tsm/sbin# touch gaga2 touch: setting times of `gaga2': No such file or directory Wed Sep 11-14:39:15 (5/41) root@iccn996:/software/all/tsm/sbin# ls -l gaga2 rw-r r - 1 root root 0 Sep 11 14:39 gaga2 I will upload the lctl dk logs (with vfstrace and rpctrace). Is there any other information we should get?

People

Assignee:: Hongchao Zhang

Reporter:: Girish Shilamkar (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 26/Jul/13 8:26 AM

Updated:: 15/Mar/14 10:48 PM

Resolved:: 22/Nov/13 10:55 PM