[LU-3230] conf-sanity fails to start run: umount of OST fails - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.6.0, Lustre 2.5.1
Affects Version/s: Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.4.2, Lustre 2.5.1
Labels:
- mn4
- zfs

Severity:
3
Rank (Obsolete):
7893

Description

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite runs:
http://maloo.whamcloud.com/test_sets/bbe080da-ad17-11e2-bd7c-52540035b04c
http://maloo.whamcloud.com/test_sets/51e42416-ad76-11e2-b72d-52540035b04c
http://maloo.whamcloud.com/test_sets/842709fa-ad73-11e2-b72d-52540035b04c

The sub-test conf-sanity failed with the following error:

test failed to respond and timed out

Info required for matching: conf-sanity conf-sanity
Info required for matching: replay-single test_90

Attachments

Issue Links

is duplicated by

LU-3665 obdfilter-survey test_3a: unmount stuck in obd_exports_barrier()

Resolved

LU-3632 insanity 0 hung when unmounting an OST

Resolved

LU-4449 Test failure timeout on sanity-scrub test_3: MGS stuck on umount with obd_unlinked_exports

Resolved

LU-5166 Test failure conf-sanity: hung on umount ost

Resolved

LU-4734 umounts of OST stuck

Closed

is related to

LU-4062 sanity test_132: MGS is waiting for obd_unlinked_exports more than 512 seconds

Closed

LU-4695 Timeout at end of recovery-small

Closed

LU-4734 umounts of OST stuck

Closed

is related to

LU-4019 today's master stick on shutdown on test == sanity test 132: on lu_object_find_at

Resolved

LU-2015 Test failure on test suite obdfilter-survey, subtest test_3a

Resolved

LU-2939 Lustre: MGS is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 5. Is it stuck?

Resolved

(3 is related to, 3 is related to )

Activity

[LU-3230] conf-sanity fails to start run: umount of OST fails

Nathaniel Clark added a comment - 16/Dec/13 8:22 PM

back-port to b2_4 http://review.whamcloud.com/8591

Nathaniel Clark added a comment - 16/Dec/13 8:22 PM back-port to b2_4 http://review.whamcloud.com/8591

Nathaniel Clark added a comment - 16/Dec/13 7:48 PM - edited

It looks like this bug is fixed with the landing of #7995. Should I create gerrit patch to port to b2_4 and b2_5?
It will cherry-pick cleanly to the current heads of both b2_4 and b2_5?

Nathaniel Clark added a comment - 16/Dec/13 7:48 PM - edited It looks like this bug is fixed with the landing of #7995. Should I create gerrit patch to port to b2_4 and b2_5? It will cherry-pick cleanly to the current heads of both b2_4 and b2_5?

Jian Yu added a comment - 13/Dec/13 8:32 AM - edited

More instances on Lustre b2_4 branch:
https://maloo.whamcloud.com/test_sets/dcb5daa6-6579-11e3-8518-52540035b04c
https://maloo.whamcloud.com/test_sets/6c3ab5e4-6358-11e3-8c76-52540035b04c
https://maloo.whamcloud.com/test_sets/d4b0f714-6281-11e3-a8fd-52540035b04c

Jian Yu added a comment - 13/Dec/13 8:32 AM - edited More instances on Lustre b2_4 branch: https://maloo.whamcloud.com/test_sets/dcb5daa6-6579-11e3-8518-52540035b04c https://maloo.whamcloud.com/test_sets/6c3ab5e4-6358-11e3-8c76-52540035b04c https://maloo.whamcloud.com/test_sets/d4b0f714-6281-11e3-a8fd-52540035b04c

Jian Yu added a comment - 05/Dec/13 3:18 PM

Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/63/
Distro/Arch: RHEL6.4/x86_64 (server), SLES11SP2/x86_64 (client)

replay-dual test 3 hit this failure:
https://maloo.whamcloud.com/test_sets/20b3d072-5c98-11e3-956b-52540035b04c

Jian Yu added a comment - 05/Dec/13 3:18 PM Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/63/ Distro/Arch: RHEL6.4/x86_64 (server), SLES11SP2/x86_64 (client) replay-dual test 3 hit this failure: https://maloo.whamcloud.com/test_sets/20b3d072-5c98-11e3-956b-52540035b04c

Jian Yu added a comment - 26/Nov/13 8:18 AM

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/58/
Distro/Arch: RHEL6.4/x86_64

FSTYPE=zfs
MDSCOUNT=1
MDSSIZE=2097152
OSTCOUNT=2
OSTSIZE=2097152

obdfilter-survey test 3a hit the same failure:
https://maloo.whamcloud.com/test_sets/19556f3e-5608-11e3-8e94-52540035b04c

Jian Yu added a comment - 26/Nov/13 8:18 AM Lustre build: http://build.whamcloud.com/job/lustre-b2_4/58/ Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=2097152 obdfilter-survey test 3a hit the same failure: https://maloo.whamcloud.com/test_sets/19556f3e-5608-11e3-8e94-52540035b04c

Nathaniel Clark added a comment - 08/Nov/13 1:42 PM

http://review.whamcloud.com/7995

Nathaniel Clark added a comment - 08/Nov/13 1:42 PM http://review.whamcloud.com/7995

Jian Yu added a comment - 04/Nov/13 5:09 AM

Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/
Distro/Arch: RHEL6.4/x86_64

FSTYPE=zfs
MDSCOUNT=1
MDSSIZE=2097152
OSTCOUNT=2
OSTSIZE=2097152

obdfilter-survey test 3a hit the same failure:
https://maloo.whamcloud.com/test_sets/a488f632-4453-11e3-8472-52540035b04c

Jian Yu added a comment - 04/Nov/13 5:09 AM Lustre build: http://build.whamcloud.com/job/lustre-b2_4/47/ Distro/Arch: RHEL6.4/x86_64 FSTYPE=zfs MDSCOUNT=1 MDSSIZE=2097152 OSTCOUNT=2 OSTSIZE=2097152 obdfilter-survey test 3a hit the same failure: https://maloo.whamcloud.com/test_sets/a488f632-4453-11e3-8472-52540035b04c

Nathaniel Clark added a comment - 17/Oct/13 4:41 PM

Debugging patch to try to see if 6988 was on the right track but not broad enough.

http://review.whamcloud.com/7995

Nathaniel Clark added a comment - 17/Oct/13 4:41 PM Debugging patch to try to see if 6988 was on the right track but not broad enough. http://review.whamcloud.com/7995

Nathaniel Clark added a comment - 17/Oct/13 3:19 PM

There have been two "recent" (Sept 2013) non conf-sanity/- failures (both in replay-single):

replay-single/74 https://maloo.whamcloud.com/test_sets/f441c460-227f-11e3-af6a-52540035b04c
A review-dne-zfs failure on OST0000

21:28:53:Lustre: DEBUG MARKER: umount -d /mnt/ost1
21:28:53:Lustre: Failing over lustre-OST0000
21:28:53:LustreError: 15640:0:(ost_handler.c:1782:ost_blocking_ast()) Error -2 syncing data on lock cancel
21:28:53:Lustre: 15640:0:(service.c:2030:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (50:74s); client may timeout.  req@ffff880046d72c00 x1446662193136696/t0(0) o103->cea0ffc2-1873-4321-a1a2-348391764373@10.10.16.253@tcp:0/0 lens 328/192 e 0 to 0 dl 1379651120 ref 1 fl Complete:H/0/0 rc -19/-19
21:28:53:LustreError: 7671:0:(ost_handler.c:1782:ost_blocking_ast()) Error -2 syncing data on lock cancel
21:28:53:Lustre: lustre-OST0000: Not available for connect from 10.10.17.1@tcp (stopping)
21:28:53:Lustre: Skipped 5 previous similar messages
21:28:53:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 7. Is it stuck?
21:28:53:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 7. Is it stuck?
21:28:53:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 7. Is it stuck?
21:40:22:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 7. Is it stuck?

The other is review run replay-single/53e https://maloo.whamcloud.com/test_sets/ddb85db2-208b-11e3-b9bc-52540035b04c (NOT ZFS)
The MGS fails:

03:55:06:Lustre: DEBUG MARKER: umount -d /mnt/mds1
03:55:06:LustreError: 166-1: MGC10.10.4.154@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
03:55:07:Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?
03:55:31:Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck?
03:56:05:Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?

Nathaniel Clark added a comment - 17/Oct/13 3:19 PM There have been two "recent" (Sept 2013) non conf-sanity/- failures (both in replay-single): replay-single/74 https://maloo.whamcloud.com/test_sets/f441c460-227f-11e3-af6a-52540035b04c A review-dne-zfs failure on OST0000 21:28:53:Lustre: DEBUG MARKER: umount -d /mnt/ost1 21:28:53:Lustre: Failing over lustre-OST0000 21:28:53:LustreError: 15640:0:(ost_handler.c:1782:ost_blocking_ast()) Error -2 syncing data on lock cancel 21:28:53:Lustre: 15640:0:(service.c:2030:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (50:74s); client may timeout. req@ffff880046d72c00 x1446662193136696/t0(0) o103->cea0ffc2-1873-4321-a1a2-348391764373@10.10.16.253@tcp:0/0 lens 328/192 e 0 to 0 dl 1379651120 ref 1 fl Complete:H/0/0 rc -19/-19 21:28:53:LustreError: 7671:0:(ost_handler.c:1782:ost_blocking_ast()) Error -2 syncing data on lock cancel 21:28:53:Lustre: lustre-OST0000: Not available for connect from 10.10.17.1@tcp (stopping) 21:28:53:Lustre: Skipped 5 previous similar messages 21:28:53:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 7. Is it stuck? 21:28:53:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 7. Is it stuck? 21:28:53:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 7. Is it stuck? 21:40:22:Lustre: lustre-OST0000 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 7. Is it stuck? The other is review run replay-single/53e https://maloo.whamcloud.com/test_sets/ddb85db2-208b-11e3-b9bc-52540035b04c (NOT ZFS) The MGS fails: 03:55:06:Lustre: DEBUG MARKER: umount -d /mnt/mds1 03:55:06:LustreError: 166-1: MGC10.10.4.154@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail 03:55:07:Lustre: MGS is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck? 03:55:31:Lustre: MGS is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 5. Is it stuck? 03:56:05:Lustre: MGS is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 5. Is it stuck?

Nathaniel Clark added a comment - 03/Oct/13 5:58 PM - edited

sanity/132 failures appear to be ~~LU-4019~~.

Nathaniel Clark added a comment - 03/Oct/13 5:58 PM - edited sanity/132 failures appear to be LU-4019 .

Nathaniel Clark added a comment - 01/Oct/13 6:07 PM

sanity/132 seem to share the following OST logs:

15:51:18:Lustre: DEBUG MARKER: == sanity test 132: som avoids glimpse rpc == 15:50:26 (1380581426)
15:51:18:LustreError: 23533:0:(ost_handler.c:1775:ost_blocking_ast()) Error -2 syncing data on lock cancel
15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) reconnecting
15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) refused reconnection, still busy with 1 active RPCs
15:51:18:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n ost.OSS.ost.stats
15:51:18:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n ost.OSS.ost.stats
15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) reconnecting
15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) refused reconnection, still busy with 1 active RPCs
15:51:18:LustreError: 11-0: lustre-MDT0000-lwp-OST0001: Communicating with 10.10.16.120@tcp, operation obd_ping failed with -107.
15:51:18:Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.10.16.120@tcp) was lost; in progress operations using this service will wait for recovery to complete

Then a umount of OST0006 which never completes:

15:52:09:Lustre: 7404:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1380581484/real 1380581484]  req@ffff8800634d5800 x1447637766224616/t0(0) o250->MGC10.10.16.120@tcp@10.10.16.120@tcp:26/25 lens 400/544 e 0 to 1 dl 1380581500 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
15:52:09:Lustre: lustre-OST0006 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck?

From the MDT console log:

16:51:27:Lustre: DEBUG MARKER: == sanity test 132: som avoids glimpse rpc == 15:50:26 (1380581426)
16:51:27:LustreError: 11-0: lustre-OST0006-osc-MDT0000: Communicating with 10.10.16.121@tcp, operation ost_connect failed with -16.
16:51:27:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdt.*.som
16:51:27:LustreError: 11-0: lustre-OST0006-osc-MDT0000: Communicating with 10.10.16.121@tcp, operation ost_connect failed with -16.
16:51:27:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.mdt.som=enabled
16:51:27:Lustre: Setting parameter lustre-MDT0000.mdt.som in log lustre-MDT0000
16:51:27:Lustre: Skipped 5 previous similar messages
16:51:27:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
16:51:27:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1
16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88004efcd000 x1447637735940204/t0(0) o13->lustre-OST0000-osc-MDT0000@10.10.16.121@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88004efcd000 x1447637735940208/t0(0) o13->lustre-OST0002-osc-MDT0000@10.10.16.121@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88004efcd000 x1447637735940216/t0(0) o6->lustre-OST0003-osc-MDT0000@10.10.16.121@tcp:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 1 previous similar message
16:51:27:Lustre: lustre-MDT0000: Not available for connect from 10.10.16.121@tcp (stopping)
16:51:27:Lustre: lustre-MDT0000: Not available for connect from 10.10.16.121@tcp (stopping)
16:51:27:LustreError: 3508:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8800569b5400 x1447637735940228/t0(0) o13->lustre-OST0004-osc-MDT0000@10.10.16.121@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
16:51:27:Lustre: 15981:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1380581444/real 1380581444]  req@ffff8800569b5400 x1447637735940248/t0(0) o251->MGC10.10.16.120@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1380581450 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
16:51:27:LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.10.16.121@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
16:51:27:Lustre: server umount lustre-MDT0000 complete

From debug log on OST:

...
1380410772.384659:(ldlm_lock.c:454:lock_handle_free()) slab-freed 'lock': 504 at ffff880025067c80.
1380410772.386661:(ldlm_lock.c:454:lock_handle_free()) slab-freed 'lock': 504 at ffff88002583e380.
1380410831.744886:(ofd_objects.c:563:ofd_attr_get()) Process entered
1380410831.744887:(ofd_objects.c:588:ofd_attr_get()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
1380410831.744889:(lprocfs_jobstats.c:217:lprocfs_job_stats_log()) Process entered
1380410831.744890:(lprocfs_jobstats.c:224:lprocfs_job_stats_log()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)
1380410831.744891:(ofd_obd.c:1456:ofd_sync()) Process leaving
1380410831.744892:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4279240389 : 4279240389 : ff1006c5)
1380410831.744893:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4279240389 : 4279240389 : ff1006c5)
1380410831.744897:(ofd_dev.c:285:ofd_object_free()) Process entered
1380410831.744897:(ofd_dev.c:289:ofd_object_free()) object free, fid = [0x100000000:0x17c5:0x0]
1380410831.744898:(ofd_dev.c:293:ofd_object_free()) slab-freed '(of)': 160 at ffff880026e3e9f0.
1380410831.744899:(ofd_dev.c:294:ofd_object_free()) Process leaving
1380410831.744899:(obd_class.h:1326:obd_sync()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe)
1380410831.744900:(ost_handler.c:1775:ost_blocking_ast()) Error -2 syncing data on lock cancel
1380410831.745806:(ost_handler.c:1777:ost_blocking_ast()) slab-freed '((oa))': 208 at ffff88002690ca40.
1380410831.745808:(ost_handler.c:1778:ost_blocking_ast()) kfreed 'oinfo': 112 at ffff880026b61140.

Nathaniel Clark added a comment - 01/Oct/13 6:07 PM sanity/132 seem to share the following OST logs: 15:51:18:Lustre: DEBUG MARKER: == sanity test 132: som avoids glimpse rpc == 15:50:26 (1380581426) 15:51:18:LustreError: 23533:0:(ost_handler.c:1775:ost_blocking_ast()) Error -2 syncing data on lock cancel 15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) reconnecting 15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) refused reconnection, still busy with 1 active RPCs 15:51:18:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n ost.OSS.ost.stats 15:51:18:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n ost.OSS.ost.stats 15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) reconnecting 15:51:18:Lustre: lustre-OST0006: Client lustre-MDT0000-mdtlov_UUID (at 10.10.16.120@tcp) refused reconnection, still busy with 1 active RPCs 15:51:18:LustreError: 11-0: lustre-MDT0000-lwp-OST0001: Communicating with 10.10.16.120@tcp, operation obd_ping failed with -107. 15:51:18:Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.10.16.120@tcp) was lost; in progress operations using this service will wait for recovery to complete Then a umount of OST0006 which never completes: 15:52:09:Lustre: 7404:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1380581484/real 1380581484] req@ffff8800634d5800 x1447637766224616/t0(0) o250->MGC10.10.16.120@tcp@10.10.16.120@tcp:26/25 lens 400/544 e 0 to 1 dl 1380581500 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 15:52:09:Lustre: lustre-OST0006 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 5. Is it stuck? From the MDT console log: 16:51:27:Lustre: DEBUG MARKER: == sanity test 132: som avoids glimpse rpc == 15:50:26 (1380581426) 16:51:27:LustreError: 11-0: lustre-OST0006-osc-MDT0000: Communicating with 10.10.16.121@tcp, operation ost_connect failed with -16. 16:51:27:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param mdt.*.som 16:51:27:LustreError: 11-0: lustre-OST0006-osc-MDT0000: Communicating with 10.10.16.121@tcp, operation ost_connect failed with -16. 16:51:27:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.mdt.som=enabled 16:51:27:Lustre: Setting parameter lustre-MDT0000.mdt.som in log lustre-MDT0000 16:51:27:Lustre: Skipped 5 previous similar messages 16:51:27:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 16:51:27:Lustre: DEBUG MARKER: umount -d -f /mnt/mds1 16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88004efcd000 x1447637735940204/t0(0) o13->lustre-OST0000-osc-MDT0000@10.10.16.121@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88004efcd000 x1447637735940208/t0(0) o13->lustre-OST0002-osc-MDT0000@10.10.16.121@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88004efcd000 x1447637735940216/t0(0) o6->lustre-OST0003-osc-MDT0000@10.10.16.121@tcp:28/4 lens 664/432 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 16:51:27:LustreError: 3509:0:(client.c:1076:ptlrpc_import_delay_req()) Skipped 1 previous similar message 16:51:27:Lustre: lustre-MDT0000: Not available for connect from 10.10.16.121@tcp (stopping) 16:51:27:Lustre: lustre-MDT0000: Not available for connect from 10.10.16.121@tcp (stopping) 16:51:27:LustreError: 3508:0:(client.c:1076:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8800569b5400 x1447637735940228/t0(0) o13->lustre-OST0004-osc-MDT0000@10.10.16.121@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 16:51:27:Lustre: 15981:0:(client.c:1897:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1380581444/real 1380581444] req@ffff8800569b5400 x1447637735940248/t0(0) o251->MGC10.10.16.120@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1380581450 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 16:51:27:LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.10.16.121@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. 16:51:27:Lustre: server umount lustre-MDT0000 complete From debug log on OST: ... 1380410772.384659:(ldlm_lock.c:454:lock_handle_free()) slab-freed 'lock': 504 at ffff880025067c80. 1380410772.386661:(ldlm_lock.c:454:lock_handle_free()) slab-freed 'lock': 504 at ffff88002583e380. 1380410831.744886:(ofd_objects.c:563:ofd_attr_get()) Process entered 1380410831.744887:(ofd_objects.c:588:ofd_attr_get()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 1380410831.744889:(lprocfs_jobstats.c:217:lprocfs_job_stats_log()) Process entered 1380410831.744890:(lprocfs_jobstats.c:224:lprocfs_job_stats_log()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) 1380410831.744891:(ofd_obd.c:1456:ofd_sync()) Process leaving 1380410831.744892:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4279240389 : 4279240389 : ff1006c5) 1380410831.744893:(lustre_fid.h:719:fid_flatten32()) Process leaving (rc=4279240389 : 4279240389 : ff1006c5) 1380410831.744897:(ofd_dev.c:285:ofd_object_free()) Process entered 1380410831.744897:(ofd_dev.c:289:ofd_object_free()) object free, fid = [0x100000000:0x17c5:0x0] 1380410831.744898:(ofd_dev.c:293:ofd_object_free()) slab-freed '(of)': 160 at ffff880026e3e9f0. 1380410831.744899:(ofd_dev.c:294:ofd_object_free()) Process leaving 1380410831.744899:(obd_class.h:1326:obd_sync()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 1380410831.744900:(ost_handler.c:1775:ost_blocking_ast()) Error -2 syncing data on lock cancel 1380410831.745806:(ost_handler.c:1777:ost_blocking_ast()) slab-freed '((oa))': 208 at ffff88002690ca40. 1380410831.745808:(ost_handler.c:1778:ost_blocking_ast()) kfreed 'oinfo': 112 at ffff880026b61140.

People

Assignee:: Nathaniel Clark

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 25/Apr/13 5:15 PM

Updated:: 10/Jun/14 5:28 PM

Resolved:: 21/Feb/14 5:14 PM