Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.1.0
-
None
-
Lustre Branch: master
Lustre Build: http://newbuild.whamcloud.com/job/lustre-master/275/
Distro/Arch: RHEL6/x86_64
-
3
-
5473
Description
While running sanity tests on a single test node, test 27q hung as follows:
== sanity test 27q: append to truncated file with all OSTs full (should error) ===== 21:56:29 (1314852989) fail_loc=0 /mnt/lustre/d0.sanity/d27/f27q has size 80000000 OK OSTIDX=0 MDSIDX=1 osc.lustre-OST0000-osc-MDT0000.prealloc_last_id=65 osc.lustre-OST0000-osc-MDT0000.prealloc_next_id=65 osc.lustre-OST0001-osc-MDT0000.prealloc_last_id=513 osc.lustre-OST0001-osc-MDT0000.prealloc_next_id=292 fail_val=-1 fail_loc=0x215 Creating to objid 65 on ost lustre-OST0000...
Console log showed that:
Lustre: DEBUG MARKER: == sanity test 27q: append to truncated file with all OSTs full (should error) ===== 21:56:29 (1314852989) LustreError: 8011:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x2917 sub-object on OST idx 0/1: rc = -28 LustreError: 8011:0:(lov_request.c:569:lov_update_create_set()) Skipped 1 previous similar message LustreError: 9113:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 4276:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 4276:0:(ldlm_lib.c:2128:target_send_reply_msg()) @@@ processing error (-28) req@ffff88060eafe400 x1378722884187257/t0(0) o-1->dda7060d-4cff-074b-9616-5a5558ff548b@NET_0x9000000000000_UUID:0/0 lens 456/0 e 0 to 0 dl 1314853018 ref 1 fl Interpret:/ffffffff/ffffffff rc 0/-1 LustreError: 11-0: an error occurred while communicating with 0@lo. The ost_write operation failed with -28 LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) Skipped 49013 previous similar messages LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 *** LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) Skipped 100123 previous similar messages Lustre: Service thread pid 8011 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 8011, comm: mdt_04 Call Trace: [<ffffffffa03076fe>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa076f79e>] lov_create+0xbce/0x1580 [lov] [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20 [<ffffffffa09a64dd>] mdd_lov_create+0xadd/0x21f0 [mdd] [<ffffffffa09b79f2>] mdd_create+0xca2/0x1db0 [mdd] [<ffffffffa0995208>] ? mdd_version_get+0x68/0xa0 [mdd] [<ffffffffa0a792bc>] cml_create+0xbc/0x280 [cmm] [<ffffffffa0a1eb7a>] mdt_reint_open+0x1bca/0x2c80 [mdt] [<ffffffffa0a069df>] mdt_reint_rec+0x3f/0x100 [mdt] [<ffffffffa09fee84>] mdt_reint_internal+0x6d4/0x9f0 [mdt] [<ffffffffa09ff505>] mdt_intent_reint+0x245/0x600 [mdt] [<ffffffffa09f7410>] mdt_intent_policy+0x3c0/0x6b0 [mdt] [<ffffffffa0510afa>] ldlm_lock_enqueue+0x2da/0xa50 [ptlrpc] [<ffffffffa052f305>] ? ldlm_export_lock_get+0x15/0x20 [ptlrpc] [<ffffffffa03155e2>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs] [<ffffffffa05371f7>] ldlm_handle_enqueue0+0x447/0x1090 [ptlrpc] [<ffffffffa09ea53f>] ? mdt_unpack_req_pack_rep+0xcf/0x5d0 [mdt] [<ffffffffa09f782a>] mdt_enqueue+0x4a/0x110 [mdt] [<ffffffffa09f2bb5>] mdt_handle_common+0x8d5/0x1810 [mdt] [<ffffffffa0555104>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc] [<ffffffffa09f3bc5>] mdt_regular_handle+0x15/0x20 [mdt] [<ffffffffa0565c7e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc] [<ffffffffa05650f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffffa05650f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffffa05650f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc] [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 LustreError: dumping log to /tmp/lustre-log.1314853064.8011
Maloo report: https://maloo.whamcloud.com/test_sets/7ce0bc08-d45e-11e0-8d02-52540025f9af
This could be reproduced easily by just running 'REFORMAT="--reformat" bash sanity.sh' on a single node.
Attachments
Issue Links
- Trackbacks
-
Lustre 2.1.0 release testing tracker
Lustre 2.1.0 RC0 Tag: v2100RC0 Created Date: 20110820 The difference between RC0 and RC1 is only a date change in lustre/ChangeLog. Lustre 2.1....