Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-463

orphan recovery happens too late, causing writes to fail with ENOENT after recovery

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.1.0, (10)
      Lustre 2.2.0, Lustre 2.1.1, Lustre 2.1.2, Lustre 2.1.3, Lustre 2.1.4, Lustre 2.1.5, Lustre 1.8.8, Lustre 1.8.6, Lustre 1.8.9, Lustre 2.1.6
    • None
    • 3
    • 22,777
    • 5680

    Description

      While running recovery-mds-scale with FLAVOR=OSS, it failed as follows after running 3 hours:

      ==== Checking the clients loads AFTER  failover -- failure NOT OK
      ost5 has failed over 5 times, and counting...
      sleeping 246 seconds ... 
      tar: etc/rc.d/rc6.d/K88rsyslog: Cannot stat: No such file or directory
      tar: Exiting with failure status due to previous errors
      Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file
      client-21-ib
      Client load failed on node client-21-ib
      
      client client-21-ib load stdout and debug files :
                    /tmp/recovery-mds-scale.log_run_tar.sh-client-21-ib
                    /tmp/recovery-mds-scale.log_run_tar.sh-client-21-ib.debug
      2011-06-26 08:08:03 Terminating clients loads ...
      Duration:                86400
      Server failover period: 600 seconds
      Exited after:           13565 seconds
      Number of failovers before exit:
      mds: 0 times
      ost1: 2 times
      ost2: 6 times
      ost3: 3 times
      ost4: 4 times
      ost5: 5 times
      ost6: 3 times
      Status: FAIL: rc=1
      

      Syslog on client node client-21-ib showed that:

      Jun 26 08:03:55 client-21 kernel: Lustre: DEBUG MARKER: ost5 has failed over 5 times, and counting...
      Jun 26 08:04:20 client-21 kernel: LustreError: 18613:0:(client.c:2347:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff88031daf6c00 x1372677268199869/t98784270264 o2->lustre-OST0005_UUID@192.168.4.132@o2ib:28/4 lens 400/592 e 0 to 1 dl 1309100718 ref 2 fl Interpret:R/4/0 rc -2/-2
      

      Syslog on the MDS node client-10-ib showed that:

      Jun 26 08:03:57 client-10-ib kernel: Lustre: DEBUG MARKER: ost5 has failed over 5 times, and counting...
      Jun 26 08:04:22 client-10-ib kernel: LustreError: 17651:0:(client.c:2347:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff810320674400 x1372677249608261/t98784270265 o2->lustre-OST0005_UUID@192.168.4.132@o2ib:28/4 lens 400/592 e 0 to 1 dl 1309100720 ref 2 fl Interpret:R/4/0 rc -2/-2
      

      Syslog on the OSS node fat-amd-1-ib showed that:

      Jun 26 08:03:57 fat-amd-1-ib kernel: Lustre: DEBUG MARKER: ost5 has failed over 5 times, and counting...
      Jun 26 08:04:21 fat-amd-1-ib kernel: Lustre: 6278:0:(ldlm_lib.c:1815:target_queue_last_replay_reply()) lustre-OST0005: 5 recoverable clients remain
      Jun 26 08:04:21 fat-amd-1-ib kernel: Lustre: 6278:0:(ldlm_lib.c:1815:target_queue_last_replay_reply()) Skipped 2 previous similar messagesJun 26 08:04:21 fat-amd-1-ib kernel: LustreError: 6336:0:(ldlm_resource.c:862:ldlm_resource_add()) filter-lustre-OST0005_UUID: lvbo_init failed for resource 161916: rc -2
      Jun 26 08:04:21 fat-amd-1-ib kernel: LustreError: 6336:0:(ldlm_resource.c:862:ldlm_resource_add()) Skipped 18 previous similar messagesJun 26 08:04:25 fat-amd-1-ib kernel: LustreError: 7708:0:(filter_log.c:135:filter_cancel_cookies_cb()) error cancelling log cookies: rc = -19
      Jun 26 08:04:25 fat-amd-1-ib kernel: LustreError: 7708:0:(filter_log.c:135:filter_cancel_cookies_cb()) Skipped 8 previous similar messagesJun 26 08:04:25 fat-amd-1-ib kernel: Lustre: lustre-OST0005: Recovery period over after 0:05, of 6 clients 6 recovered and 0 were evicted.
      Jun 26 08:04:25 fat-amd-1-ib kernel: Lustre: lustre-OST0005: sending delayed replies to recovered clientsJun 26 08:04:25 fat-amd-1-ib kernel: Lustre: lustre-OST0005: received MDS connection from 192.168.4.10@o2ib
      

      Maloo report: https://maloo.whamcloud.com/test_sets/f1c2fd72-a067-11e0-aee5-52540025f9af

      Please find the debug logs in the attachment.

      This is a known issue: bug 22777

      Attachments

        Issue Links

          Activity

            [LU-463] orphan recovery happens too late, causing writes to fail with ENOENT after recovery

            the patch against b2_1 is under creation&test.

            hongchao.zhang Hongchao Zhang added a comment - the patch against b2_1 is under creation&test.
            yujian Jian Yu added a comment -

            This has been blocking the recovery-mds-scale failover_ost test.

            yujian Jian Yu added a comment - This has been blocking the recovery-mds-scale failover_ost test.

            how about fixing the bug by waiting some time if the -2(ENOENT) is encountered on OST, which is in recovery mode atm.
            will produce a patch by this way.

            hongchao.zhang Hongchao Zhang added a comment - how about fixing the bug by waiting some time if the -2(ENOENT) is encountered on OST, which is in recovery mode atm. will produce a patch by this way.
            pjones Peter Jones added a comment -

            Hongchao

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Hongchao Could you please look into this one? Thanks Peter
            yujian Jian Yu added a comment - Another instance: https://maloo.whamcloud.com/test_sets/f99459d2-eb26-11e1-b137-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_1_3_RC1
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/113/
            Distro/Arch: RHEL6.3/x86_64 (kernel version: 2.6.32-279.2.1.el6)
            Network: IB (in-kernel OFED)
            ENABLE_QUOTA=yes
            FAILURE_MODE=HARD

            The issue occurred again while running recovery-mds-scale failover_ost test:
            https://maloo.whamcloud.com/test_sets/b18a1330-e5ad-11e1-ae4e-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_1_3_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/113/ Distro/Arch: RHEL6.3/x86_64 (kernel version: 2.6.32-279.2.1.el6) Network: IB (in-kernel OFED) ENABLE_QUOTA=yes FAILURE_MODE=HARD The issue occurred again while running recovery-mds-scale failover_ost test: https://maloo.whamcloud.com/test_sets/b18a1330-e5ad-11e1-ae4e-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_1_2_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/
            Distro/Arch: RHEL6.2/x86_64
            Network: TCP (1GigE)
            ENABLE_QUOTA=yes
            FAILURE_MODE=HARD

            The same issue occurred while failing over OST: https://maloo.whamcloud.com/test_sets/c9193e08-abca-11e1-9b8f-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v2_1_2_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/86/ Distro/Arch: RHEL6.2/x86_64 Network: TCP (1GigE) ENABLE_QUOTA=yes FAILURE_MODE=HARD The same issue occurred while failing over OST: https://maloo.whamcloud.com/test_sets/c9193e08-abca-11e1-9b8f-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v1_8_8_WC1_RC1
            Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/195/
            Distro/Arch: RHEL5.8/x86_64(server), RHEL6.2/x86_64(client)
            Network: TCP (1GigE)
            ENABLE_QUOTA=yes
            FAILURE_MODE=HARD

            The same issue occurred while failing over OST: https://maloo.whamcloud.com/test_sets/be9c60e0-9e82-11e1-9080-52540035b04c

            yujian Jian Yu added a comment - Lustre Tag: v1_8_8_WC1_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/195/ Distro/Arch: RHEL5.8/x86_64(server), RHEL6.2/x86_64(client) Network: TCP (1GigE) ENABLE_QUOTA=yes FAILURE_MODE=HARD The same issue occurred while failing over OST: https://maloo.whamcloud.com/test_sets/be9c60e0-9e82-11e1-9080-52540035b04c
            yujian Jian Yu added a comment -

            Lustre Tag: v2_2_0_0_RC2
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_2/17/
            Distro/Arch: SLES11SP1/x86_64(client), RHEL6.2/x86_64(server)
            Network: TCP (1GigE)
            ENABLE_QUOTA=yes
            FAILURE_MODE=HARD

            The same issue occurred while failing over OST: https://maloo.whamcloud.com/test_sets/b6eb20c8-799f-11e1-9d2a-5254004bbbd3

            yujian Jian Yu added a comment - Lustre Tag: v2_2_0_0_RC2 Lustre Build: http://build.whamcloud.com/job/lustre-b2_2/17/ Distro/Arch: SLES11SP1/x86_64(client), RHEL6.2/x86_64(server) Network: TCP (1GigE) ENABLE_QUOTA=yes FAILURE_MODE=HARD The same issue occurred while failing over OST: https://maloo.whamcloud.com/test_sets/b6eb20c8-799f-11e1-9d2a-5254004bbbd3
            yujian Jian Yu added a comment -

            Lustre Tag: v2_1_1_0_RC4
            Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/44/
            e2fsprogs Build: http://build.whamcloud.com/job/e2fsprogs-master/217/
            Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-220.el6)
            Network: IB (in-kernel OFED)
            ENABLE_QUOTA=yes
            FAILURE_MODE=HARD
            FLAVOR=OSS

            Configuration:

            MGS/MDS Nodes: client-8-ib
            
            OSS Nodes: client-18-ib(active), client-19-ib(active)
                                          \ /
                                          OST1 (active in client-18-ib)
                                          OST2 (active in client-19-ib)
                                          OST3 (active in client-18-ib)
                                          OST4 (active in client-19-ib)
                                          OST5 (active in client-18-ib)
                                          OST6 (active in client-19-ib)
                       client-9-ib(OST7)
            
            Client Nodes: client-[1,4,17],fat-amd-2,fat-intel-2
            
            Network Addresses:
            client-1-ib: 192.168.4.1
            client-4-ib: 192.168.4.4
            client-8-ib: 192.168.4.8
            client-9-ib: 192.168.4.9
            client-17-ib: 192.168.4.17
            client-18-ib: 192.168.4.18
            client-19-ib: 192.168.4.19
            fat-amd-2-ib: 192.168.4.133
            fat-intel-2-ib: 192.168.4.129
            

            While running recovery-mds-scale with FLAVOR=OSS, it failed as follows:

            ==== Checking the clients loads AFTER  failover -- failure NOT OK
            ost1 has failed over 1 times, and counting...
            sleeping 717 seconds ...
            tar: etc/selinux/targeted/contexts/users/root: Cannot write: No such file or directory
            tar: Exiting with failure status due to previous errors
            Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file
            client-1-ib
            Client load failed on node client-1-ib
            
            client client-1-ib load stdout and debug files :
                          /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib
                          /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib.debug
            

            /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib:

            tar: etc/selinux/targeted/contexts/users/root: Cannot write: No such file or directory
            tar: Exiting with failure status due to previous errors
            

            /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib.debug

            <~snip~>
            2012-02-22 03:56:04: tar run starting
            + mkdir -p /mnt/lustre/d0.tar-client-1-ib
            + cd /mnt/lustre/d0.tar-client-1-ib
            + wait 11196
            + do_tar
            + tar cf - /etc
            + tar xf -
            + tee /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib
            tar: Removing leading `/' from member names
            + return 2
            + RC=2
            ++ grep 'exit delayed from previous errors' /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib
            + PREV_ERRORS=
            + true
            + '[' 2 -ne 0 -a '' -a '' ']'
            + '[' 2 -eq 0 ']'
            ++ date '+%F %H:%M:%S'
            + echoerr '2012-02-22 03:59:25: tar failed'
            + echo '2012-02-22 03:59:25: tar failed'
            2012-02-22 03:59:25: tar failed
            <~snip~>
            

            Syslog on client node client-1-ib showed that:

            Feb 22 03:59:12 client-1 kernel: Lustre: DEBUG MARKER: ost1 has failed over 1 times, and counting...
            Feb 22 03:59:19 client-1 kernel: LustreError: 10064:0:(client.c:2590:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff88031d605c00 x1394513519058221/t379(379) o-1->lustre-OST0004_UUID@192.168.4.19@o2ib:28/4 lens 408/400 e 0 to 0 dl 1329912005 ref 2 fl Interpret:R/ffffffff/ffffffff rc -2/-1
            Feb 22 03:59:19 client-1 kernel: LustreError: 10064:0:(client.c:2590:ptlrpc_replay_interpret()) Skipped 4 previous similar messages
            Feb 22 03:59:19 client-1 kernel: Lustre: lustre-OST0004-osc-ffff88032c89a400: Connection restored to service lustre-OST0004 using nid 192.168.4.19@o2ib.
            

            Syslog on MDS node client-8-ib showed that:

            Feb 22 03:59:12 client-8-ib kernel: Lustre: DEBUG MARKER: ost1 has failed over 1 times, and counting...
            Feb 22 03:59:19 client-8-ib kernel: LustreError: 5628:0:(client.c:2590:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff88030708c400 x1394513506470444/t380(380) o-1->lustre-OST0004_UUID@192.168.4.19@o2ib:28/4 lens 408/400 e 0 to 0 dl 1329912005 ref 2 fl Interpret:R/ffffffff/ffffffff rc -2/-1
            Feb 22 03:59:19 client-8-ib kernel: LustreError: 5628:0:(client.c:2590:ptlrpc_replay_interpret()) Skipped 4 previous similar messages
            Feb 22 03:59:19 client-8-ib kernel: Lustre: lustre-OST0004-osc-MDT0000: Connection restored to service lustre-OST0004 using nid 192.168.4.19@o2ib.
            Feb 22 03:59:19 client-8-ib kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0004_UUID now active, resetting orphans
            Feb 22 03:59:19 client-8-ib kernel: Lustre: 7395:0:(quota_master.c:1760:mds_quota_recovery()) Only 3/7 OSTs are active, abort quota recovery
            

            Syslog on OSS node client-19-ib showed that:

            Feb 22 03:59:12 client-19-ib kernel: Lustre: DEBUG MARKER: ost1 has failed over 1 times, and counting...
            Feb 22 03:59:18 client-19-ib kernel: Lustre: 7501:0:(filter.c:2697:filter_connect_internal()) lustre-OST0004: Received MDS connection for group 0
            Feb 22 03:59:18 client-19-ib kernel: LustreError: 9874:0:(filter.c:4141:filter_destroy())  lustre-OST0004: can not find olg of group 0
            Feb 22 03:59:18 client-19-ib kernel: LustreError: 9874:0:(filter.c:4141:filter_destroy()) Skipped 22 previous similar messages
            Feb 22 03:59:19 client-19-ib kernel: Lustre: lustre-OST0004: sending delayed replies to recovered clients
            Feb 22 03:59:19 client-19-ib kernel: Lustre: lustre-OST0004: received MDS connection from 192.168.4.8@o2ib
            Feb 22 03:59:19 client-19-ib kernel: Lustre: 7530:0:(filter.c:2553:filter_llog_connect()) lustre-OST0004: Recovery from log 0xff506/0x0:8f36a744
            

            Please refer to /scratch/logs/2.1.1/recovery-oss-scale.1329912676.log.tar.bz2 on brent node for debug and other logs.

            yujian Jian Yu added a comment - Lustre Tag: v2_1_1_0_RC4 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/44/ e2fsprogs Build: http://build.whamcloud.com/job/e2fsprogs-master/217/ Distro/Arch: RHEL6/x86_64 (kernel version: 2.6.32-220.el6) Network: IB (in-kernel OFED) ENABLE_QUOTA=yes FAILURE_MODE=HARD FLAVOR=OSS Configuration: MGS/MDS Nodes: client-8-ib OSS Nodes: client-18-ib(active), client-19-ib(active) \ / OST1 (active in client-18-ib) OST2 (active in client-19-ib) OST3 (active in client-18-ib) OST4 (active in client-19-ib) OST5 (active in client-18-ib) OST6 (active in client-19-ib) client-9-ib(OST7) Client Nodes: client-[1,4,17],fat-amd-2,fat-intel-2 Network Addresses: client-1-ib: 192.168.4.1 client-4-ib: 192.168.4.4 client-8-ib: 192.168.4.8 client-9-ib: 192.168.4.9 client-17-ib: 192.168.4.17 client-18-ib: 192.168.4.18 client-19-ib: 192.168.4.19 fat-amd-2-ib: 192.168.4.133 fat-intel-2-ib: 192.168.4.129 While running recovery-mds-scale with FLAVOR=OSS, it failed as follows: ==== Checking the clients loads AFTER failover -- failure NOT OK ost1 has failed over 1 times, and counting... sleeping 717 seconds ... tar: etc/selinux/targeted/contexts/users/root: Cannot write: No such file or directory tar: Exiting with failure status due to previous errors Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file client-1-ib Client load failed on node client-1-ib client client-1-ib load stdout and debug files : /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib.debug /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib: tar: etc/selinux/targeted/contexts/users/root: Cannot write: No such file or directory tar: Exiting with failure status due to previous errors /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib.debug <~snip~> 2012-02-22 03:56:04: tar run starting + mkdir -p /mnt/lustre/d0.tar-client-1-ib + cd /mnt/lustre/d0.tar-client-1-ib + wait 11196 + do_tar + tar cf - /etc + tar xf - + tee /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib tar: Removing leading `/' from member names + return 2 + RC=2 ++ grep 'exit delayed from previous errors' /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib + PREV_ERRORS= + true + '[' 2 -ne 0 -a '' -a '' ']' + '[' 2 -eq 0 ']' ++ date '+%F %H:%M:%S' + echoerr '2012-02-22 03:59:25: tar failed' + echo '2012-02-22 03:59:25: tar failed' 2012-02-22 03:59:25: tar failed <~snip~> Syslog on client node client-1-ib showed that: Feb 22 03:59:12 client-1 kernel: Lustre: DEBUG MARKER: ost1 has failed over 1 times, and counting... Feb 22 03:59:19 client-1 kernel: LustreError: 10064:0:(client.c:2590:ptlrpc_replay_interpret()) @@@ status -2, old was 0 req@ffff88031d605c00 x1394513519058221/t379(379) o-1->lustre-OST0004_UUID@192.168.4.19@o2ib:28/4 lens 408/400 e 0 to 0 dl 1329912005 ref 2 fl Interpret:R/ffffffff/ffffffff rc -2/-1 Feb 22 03:59:19 client-1 kernel: LustreError: 10064:0:(client.c:2590:ptlrpc_replay_interpret()) Skipped 4 previous similar messages Feb 22 03:59:19 client-1 kernel: Lustre: lustre-OST0004-osc-ffff88032c89a400: Connection restored to service lustre-OST0004 using nid 192.168.4.19@o2ib. Syslog on MDS node client-8-ib showed that: Feb 22 03:59:12 client-8-ib kernel: Lustre: DEBUG MARKER: ost1 has failed over 1 times, and counting... Feb 22 03:59:19 client-8-ib kernel: LustreError: 5628:0:(client.c:2590:ptlrpc_replay_interpret()) @@@ status -2, old was 0 req@ffff88030708c400 x1394513506470444/t380(380) o-1->lustre-OST0004_UUID@192.168.4.19@o2ib:28/4 lens 408/400 e 0 to 0 dl 1329912005 ref 2 fl Interpret:R/ffffffff/ffffffff rc -2/-1 Feb 22 03:59:19 client-8-ib kernel: LustreError: 5628:0:(client.c:2590:ptlrpc_replay_interpret()) Skipped 4 previous similar messages Feb 22 03:59:19 client-8-ib kernel: Lustre: lustre-OST0004-osc-MDT0000: Connection restored to service lustre-OST0004 using nid 192.168.4.19@o2ib. Feb 22 03:59:19 client-8-ib kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0004_UUID now active, resetting orphans Feb 22 03:59:19 client-8-ib kernel: Lustre: 7395:0:(quota_master.c:1760:mds_quota_recovery()) Only 3/7 OSTs are active, abort quota recovery Syslog on OSS node client-19-ib showed that: Feb 22 03:59:12 client-19-ib kernel: Lustre: DEBUG MARKER: ost1 has failed over 1 times, and counting... Feb 22 03:59:18 client-19-ib kernel: Lustre: 7501:0:(filter.c:2697:filter_connect_internal()) lustre-OST0004: Received MDS connection for group 0 Feb 22 03:59:18 client-19-ib kernel: LustreError: 9874:0:(filter.c:4141:filter_destroy()) lustre-OST0004: can not find olg of group 0 Feb 22 03:59:18 client-19-ib kernel: LustreError: 9874:0:(filter.c:4141:filter_destroy()) Skipped 22 previous similar messages Feb 22 03:59:19 client-19-ib kernel: Lustre: lustre-OST0004: sending delayed replies to recovered clients Feb 22 03:59:19 client-19-ib kernel: Lustre: lustre-OST0004: received MDS connection from 192.168.4.8@o2ib Feb 22 03:59:19 client-19-ib kernel: Lustre: 7530:0:(filter.c:2553:filter_llog_connect()) lustre-OST0004: Recovery from log 0xff506/0x0:8f36a744 Please refer to /scratch/logs/2.1.1/recovery-oss-scale.1329912676.log.tar.bz2 on brent node for debug and other logs.
            yujian Jian Yu added a comment -

            Lustre Tag: v1_8_7_WC1_RC1
            Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/142/
            e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/65/
            Distro/Arch: RHEL5/x86_64(server, OFED 1.5.3.2, ext4-based ldiskfs), RHEL6/x86_64(client, in-kernel OFED)
            ENABLE_QUOTA=yes
            FAILURE_MODE=HARD
            FLAVOR=OSS

            recovery-mds-scale (FLAVOR=OSS) test failed with the same issue: https://maloo.whamcloud.com/test_sets/004f464c-f550-11e0-908b-52540025f9af

            Please refer to the attached recovery-oss-scale.1318474116.log.tar.bz2 for more logs.

            yujian Jian Yu added a comment - Lustre Tag: v1_8_7_WC1_RC1 Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/142/ e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/65/ Distro/Arch: RHEL5/x86_64(server, OFED 1.5.3.2, ext4-based ldiskfs), RHEL6/x86_64(client, in-kernel OFED) ENABLE_QUOTA=yes FAILURE_MODE=HARD FLAVOR=OSS recovery-mds-scale (FLAVOR=OSS) test failed with the same issue: https://maloo.whamcloud.com/test_sets/004f464c-f550-11e0-908b-52540025f9af Please refer to the attached recovery-oss-scale.1318474116.log.tar.bz2 for more logs.

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: