Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3645

Interop 2.1.5 <--> 2.4 Write operations during failover errors out instead of stalling

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.0, Lustre 2.1.5
    • 3
    • 9382

    Description

      During acceptance testing for KIT, they tried OSS failover while running several applications. And applications got IO errors (can't create file and similar messages). This should not happen and IO should just stall till failover happens.
      The clients were running 2.4 and servers were 2.1.5. We tried with 2.1.5 clients and did not see this issue. I have attached the client and server logs.

      Attachments

        1. client_lctl_dk_20130911.tgz
          18 kB
        2. client_messages_20130911.tgz
          228 kB
        3. ll10987.out.gz
          0.2 kB
        4. LU-XXXX.tgz
          463 kB
        5. mds1.llog.gz
          224 kB
        6. mds2.llog.gz
          0.2 kB
        7. server_lctl_dk_20130911.tgz
          392 kB
        8. ucbn003.localdomain.llog.gz
          0.2 kB

        Activity

          [LU-3645] Interop 2.1.5 <--> 2.4 Write operations during failover errors out instead of stalling

          The fix that worked on my test system didn't work on the customer system, so it might have been a fluke.

          I haven't seen that message in the new logs. Here is the latest server dk's:

          00000100:00100000:14.0:1381243911.199943:0:4598:0:(import.c:725:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5)  req@ffff8808181fb400 x1448249214211642/t0(
          0) o8->pfscdat2-OST0000-osc-MDT0000@172.26.17.3@o2ib:28/4 lens 368/512 e 0 to 0 dl 0 ref 1 fl New:N/0/ffffffff rc 0/-1
          00000100:00100000:13.0:1381243911.212619:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1:
          1448249214211637:172.26.17.3@o2ib:-1
          00020000:00020000:13.0:1381243911.212624:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -11
          00000100:00100000:13.0:1381243911.225308:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1:
          1448249214211638:172.26.17.3@o2ib:-1
          00020000:00020000:13.0:1381243911.225312:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -11
          00000100:00100000:13.0:1381243911.250579:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1:
          1448249214211639:172.26.17.3@o2ib:-1
          00020000:00020000:6.0:1381243911.250579:0:29944:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -5
          00000100:00100000:13.0:1381243911.250586:0:4596:0:(client.c:1434:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:e76ce10d-1d27-cdc4-1091-649094c70331:4596:1448249214211641:172.26.17.1@o2ib:400
          00020000:00020000:5.0:1381243911.250612:0:4755:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -5
          00000100:00100000:13.0:1381243911.250638:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:0:1448249214211644:172.26.17.3@o2ib:13
          00000100:00100000:5.0:1381243911.250741:0:4755:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_02:94b08aa2-54d7-b32e-019d-2561ed3286b5+6:58211:x1447047461344560:12345-172.26.4.3@o2ib:101 Request procesed in 49643595us (49643620us total) trans 0 rc 301/301
          
          kitwestneat Kit Westneat (Inactive) added a comment - The fix that worked on my test system didn't work on the customer system, so it might have been a fluke. I haven't seen that message in the new logs. Here is the latest server dk's: 00000100:00100000:14.0:1381243911.199943:0:4598:0:( import .c:725:ptlrpc_connect_import()) @@@ (re)connect request (timeout 5) req@ffff8808181fb400 x1448249214211642/t0( 0) o8->pfscdat2-OST0000-osc-MDT0000@172.26.17.3@o2ib:28/4 lens 368/512 e 0 to 0 dl 0 ref 1 fl New:N/0/ffffffff rc 0/-1 00000100:00100000:13.0:1381243911.212619:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1: 1448249214211637:172.26.17.3@o2ib:-1 00020000:00020000:13.0:1381243911.212624:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -11 00000100:00100000:13.0:1381243911.225308:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1: 1448249214211638:172.26.17.3@o2ib:-1 00020000:00020000:13.0:1381243911.225312:0:4596:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -11 00000100:00100000:13.0:1381243911.250579:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:-1: 1448249214211639:172.26.17.3@o2ib:-1 00020000:00020000:6.0:1381243911.250579:0:29944:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x10f5a sub-object on OST idx 0/1: rc = -5 00000100:00100000:13.0:1381243911.250586:0:4596:0:(client.c:1434:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:e76ce10d-1d27-cdc4-1091-649094c70331:4596:1448249214211641:172.26.17.1@o2ib:400 00020000:00020000:5.0:1381243911.250612:0:4755:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x18c7 sub-object on OST idx 0/1: rc = -5 00000100:00100000:13.0:1381243911.250638:0:4596:0:(client.c:1773:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc ptlrpcd:pfscdat2-MDT0000-mdtlov_UUID:0:1448249214211644:172.26.17.3@o2ib:13 00000100:00100000:5.0:1381243911.250741:0:4755:0:(service.c:1771:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt_02:94b08aa2-54d7-b32e-019d-2561ed3286b5+6:58211:x1447047461344560:12345-172.26.4.3@o2ib:101 Request procesed in 49643595us (49643620us total) trans 0 rc 301/301

          Hi Kit,

          Sorry for delayed response, we're in holiday in the last several days.

          if "OSCC_FLAG_DEGRADED" is set, then this OST won't be used preferentially when creating object. do you only test the failover on one OSS?

          BTW, is there something liken "Unknown rc XXX from async create: failing oscc" in the kernel logs?

          Thanks

          hongchao.zhang Hongchao Zhang added a comment - Hi Kit, Sorry for delayed response, we're in holiday in the last several days. if "OSCC_FLAG_DEGRADED" is set, then this OST won't be used preferentially when creating object. do you only test the failover on one OSS? BTW, is there something liken "Unknown rc XXX from async create: failing oscc" in the kernel logs? Thanks

          I tested adding the degraded flag to the oscc and it seemed to fix the problem on my test system:

                  if(rc != 0 && rc != -ENOTCONN && rc != -ENODEV)
                          GOTO(out_wake, rc);
                     
          +        if (rc == -ENOTCONN || rc ==  -ENODEV)
          +                oscc->oscc_flags |= OSCC_FLAG_DEGRADED;
          
                  /* Handle the critical type errors first.
          

          I ran into one issue where the OST ran out of inodes and it caused the lov_create threads on the MDT to deadlock. I'm not sure if that's a new behavior or in stock 2.1.6 as well. I'll investigate.

          kitwestneat Kit Westneat (Inactive) added a comment - I tested adding the degraded flag to the oscc and it seemed to fix the problem on my test system: if (rc != 0 && rc != -ENOTCONN && rc != -ENODEV) GOTO(out_wake, rc); + if (rc == -ENOTCONN || rc == -ENODEV) + oscc->oscc_flags |= OSCC_FLAG_DEGRADED; /* Handle the critical type errors first. I ran into one issue where the OST ran out of inodes and it caused the lov_create threads on the MDT to deadlock. I'm not sure if that's a new behavior or in stock 2.1.6 as well. I'll investigate.

          Hi Hongchao,

          Any updates?

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Hongchao, Any updates? Thanks, Kit

          Ah ok, that makes sense. Should handle_async_create add the RECOVERING flag to the oscc? It seems like osc_interpret_create would add it, but is it possible that it could get to oscc_internal_create without that flag?

          kitwestneat Kit Westneat (Inactive) added a comment - Ah ok, that makes sense. Should handle_async_create add the RECOVERING flag to the oscc? It seems like osc_interpret_create would add it, but is it possible that it could get to oscc_internal_create without that flag?

          Hi
          the fake requests will be queued into osc_creater.oscc_wait_create_list in osc_create_async to wait the objects creation to complete, and it will be woken up
          in osc_interpret_create with returned value from OST. in this case, -ENOTCONN or -ENODEV is returned and failed these fake requests, which should continue
          waiting the recovery of OST to complete and the objects will be ready then.

          I'll create a new debug patch with the http://review.whamcloud.com/#/c/7559/

          Thanks!

          hongchao.zhang Hongchao Zhang added a comment - Hi the fake requests will be queued into osc_creater.oscc_wait_create_list in osc_create_async to wait the objects creation to complete, and it will be woken up in osc_interpret_create with returned value from OST. in this case, -ENOTCONN or -ENODEV is returned and failed these fake requests, which should continue waiting the recovery of OST to complete and the objects will be ready then. I'll create a new debug patch with the http://review.whamcloud.com/#/c/7559/ Thanks!

          Hi Hongchao,

          I was trying to follow the create logic some and it's confused me some. It seems like the fake requests are handled by oscc_internal_create, but then if the oscc is recovering (due to the EAGAIN) all it does is return 0. Am I reading that right?

          Thanks,
          Kit

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Hongchao, I was trying to follow the create logic some and it's confused me some. It seems like the fake requests are handled by oscc_internal_create, but then if the oscc is recovering (due to the EAGAIN) all it does is return 0. Am I reading that right? Thanks, Kit

          Hi Hongchao,

          The patch fixed the data corruption, but we are still seeing the same errors on failover. Should we reproduce the issue with the original debugging patch + fix? Here are the kernel messages from the fix (patch 7559):

          Sep 25 15:16:49 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The ost_setattr operation failed with -19
          Sep 25 15:16:49 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete
          Sep 25 15:17:39 pfscn2 kernel: : LustreError: 9001:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -11
          Sep 25 15:17:39 pfscn2 kernel: : LustreError: 7308:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -5
          Sep 25 15:17:40 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib)
          Sep 25 15:17:40 pfscn2 kernel: : Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans
          
          kitwestneat Kit Westneat (Inactive) added a comment - Hi Hongchao, The patch fixed the data corruption, but we are still seeing the same errors on failover. Should we reproduce the issue with the original debugging patch + fix? Here are the kernel messages from the fix (patch 7559): Sep 25 15:16:49 pfscn2 kernel: : LustreError: 11-0: an error occurred while communicating with 172.26.17.3@o2ib. The ost_setattr operation failed with -19 Sep 25 15:16:49 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection to pfscdat2-OST0000 (at 172.26.17.3@o2ib) was lost; in progress operations using this service will wait for recovery to complete Sep 25 15:17:39 pfscn2 kernel: : LustreError: 9001:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -11 Sep 25 15:17:39 pfscn2 kernel: : LustreError: 7308:0:(lov_request.c:579:lov_update_create_set()) error creating fid 0x3288 sub-object on OST idx 0/1: rc = -5 Sep 25 15:17:40 pfscn2 kernel: : Lustre: pfscdat2-OST0000-osc-MDT0000: Connection restored to pfscdat2-OST0000 (at 172.26.17.4@o2ib) Sep 25 15:17:40 pfscn2 kernel: : Lustre: MDS mdd_obd-pfscdat2-MDT0000: pfscdat2-OST0000_UUID now active, resetting orphans

          sorry, there is an error in the previous patch, and it has been updated (http://review.whamcloud.com/#/c/7559/)

          hongchao.zhang Hongchao Zhang added a comment - sorry, there is an error in the previous patch, and it has been updated ( http://review.whamcloud.com/#/c/7559/ )

          I should add that we didn't see the problem any more after downgrading to stock 2.1.5. We did however see the issue at both 2.1.6+patch and 2.1.5+patch.

          kitwestneat Kit Westneat (Inactive) added a comment - I should add that we didn't see the problem any more after downgrading to stock 2.1.5. We did however see the issue at both 2.1.6+patch and 2.1.5+patch.

          Hi Hongchao,

          We've run into a problem with this patch. After upgrading the MDT, we are starting to get data corruption on the system. Here is a description of the issue:

          Another admin has been working on the test file system this morning.
          First, she could execute normal commands on the file system. Then
          normal commands (like vi) added 112 MB rubbish to files. Even a touch
          on a non-existing file created a file of 112 MB.

          Another admin logged in and his .Xauthority was increased. He looked
          at the binary data ant it seemed like the additional data came from a
          software package which is installed on the same Lustre file system,
          i.e. the rubbish seems to be no arbitrary data but seems to come from
          another location.

          root@iccn999:/software/all/tsm/sbin# touch gaga1
          Wed Sep 11-14:38:21 (14/1012) - ACTIVE
          root@iccn999:/software/all/tsm/sbin# ls -l gaga1
          rw-rr- 1 root root 116430464 Sep 11 14:38 gaga1

          On another client the behaviour is different:
          root@iccn996:/software/all/tsm/sbin# touch gaga2
          touch: setting times of `gaga2': No such file or directory
          Wed Sep 11-14:39:15 (5/41)
          root@iccn996:/software/all/tsm/sbin# ls -l gaga2
          rw-rr- 1 root root 0 Sep 11 14:39 gaga2

          I will upload the lctl dk logs (with vfstrace and rpctrace). Is there any other information we should get?

          kitwestneat Kit Westneat (Inactive) added a comment - Hi Hongchao, We've run into a problem with this patch. After upgrading the MDT, we are starting to get data corruption on the system. Here is a description of the issue: Another admin has been working on the test file system this morning. First, she could execute normal commands on the file system. Then normal commands (like vi) added 112 MB rubbish to files. Even a touch on a non-existing file created a file of 112 MB. Another admin logged in and his .Xauthority was increased. He looked at the binary data ant it seemed like the additional data came from a software package which is installed on the same Lustre file system, i.e. the rubbish seems to be no arbitrary data but seems to come from another location. root@iccn999:/software/all/tsm/sbin# touch gaga1 Wed Sep 11-14:38:21 (14/1012) - ACTIVE root@iccn999:/software/all/tsm/sbin# ls -l gaga1 rw-r r - 1 root root 116430464 Sep 11 14:38 gaga1 On another client the behaviour is different: root@iccn996:/software/all/tsm/sbin# touch gaga2 touch: setting times of `gaga2': No such file or directory Wed Sep 11-14:39:15 (5/41) root@iccn996:/software/all/tsm/sbin# ls -l gaga2 rw-r r - 1 root root 0 Sep 11 14:39 gaga2 I will upload the lctl dk logs (with vfstrace and rpctrace). Is there any other information we should get?

          People

            hongchao.zhang Hongchao Zhang
            gshilamkar Girish Shilamkar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: