Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-655

sanity test 27q hung

    XMLWordPrintable

Details

    • 3
    • 5473

    Description

      While running sanity tests on a single test node, test 27q hung as follows:

      == sanity test 27q: append to truncated file with all OSTs full (should error) ===== 21:56:29 (1314852989)
      fail_loc=0
      /mnt/lustre/d0.sanity/d27/f27q has size 80000000 OK
      OSTIDX=0 MDSIDX=1
      osc.lustre-OST0000-osc-MDT0000.prealloc_last_id=65
      osc.lustre-OST0000-osc-MDT0000.prealloc_next_id=65
      osc.lustre-OST0001-osc-MDT0000.prealloc_last_id=513
      osc.lustre-OST0001-osc-MDT0000.prealloc_next_id=292
      fail_val=-1
      fail_loc=0x215
      Creating to objid 65 on ost lustre-OST0000...
      

      Console log showed that:

      Lustre: DEBUG MARKER: == sanity test 27q: append to truncated file with all OSTs full (should error) ===== 21:56:29 (1314852989)
      LustreError: 8011:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x2917 sub-object on OST idx 0/1: rc = -28
      LustreError: 8011:0:(lov_request.c:569:lov_update_create_set()) Skipped 1 previous similar message
      LustreError: 9113:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
      LustreError: 4276:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
      LustreError: 4276:0:(ldlm_lib.c:2128:target_send_reply_msg()) @@@ processing error (-28)  req@ffff88060eafe400 x1378722884187257/t0(0) o-1->dda7060d-4cff-074b-9616-5a5558ff548b@NET_0x9000000000000_UUID:0/0 lens 456/0 e 0 to 0 dl 1314853018 ref 1 fl Interpret:/ffffffff/ffffffff rc 0/-1
      LustreError: 11-0: an error occurred while communicating with 0@lo. The ost_write operation failed with -28
      LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
      LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) Skipped 49013 previous similar messages
      LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) *** cfs_fail_loc=215 ***
      LustreError: 4151:0:(libcfs_fail.h:81:cfs_fail_check_set()) Skipped 100123 previous similar messages
      Lustre: Service thread pid 8011 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 8011, comm: mdt_04
      
      Call Trace:
       [<ffffffffa03076fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
       [<ffffffffa076f79e>] lov_create+0xbce/0x1580 [lov]
       [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20
       [<ffffffffa09a64dd>] mdd_lov_create+0xadd/0x21f0 [mdd]
       [<ffffffffa09b79f2>] mdd_create+0xca2/0x1db0 [mdd]
       [<ffffffffa0995208>] ? mdd_version_get+0x68/0xa0 [mdd]
       [<ffffffffa0a792bc>] cml_create+0xbc/0x280 [cmm]
       [<ffffffffa0a1eb7a>] mdt_reint_open+0x1bca/0x2c80 [mdt]
       [<ffffffffa0a069df>] mdt_reint_rec+0x3f/0x100 [mdt]
       [<ffffffffa09fee84>] mdt_reint_internal+0x6d4/0x9f0 [mdt]
       [<ffffffffa09ff505>] mdt_intent_reint+0x245/0x600 [mdt]
       [<ffffffffa09f7410>] mdt_intent_policy+0x3c0/0x6b0 [mdt]
       [<ffffffffa0510afa>] ldlm_lock_enqueue+0x2da/0xa50 [ptlrpc]
       [<ffffffffa052f305>] ? ldlm_export_lock_get+0x15/0x20 [ptlrpc]
       [<ffffffffa03155e2>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs]
       [<ffffffffa05371f7>] ldlm_handle_enqueue0+0x447/0x1090 [ptlrpc]
       [<ffffffffa09ea53f>] ? mdt_unpack_req_pack_rep+0xcf/0x5d0 [mdt]
       [<ffffffffa09f782a>] mdt_enqueue+0x4a/0x110 [mdt]
       [<ffffffffa09f2bb5>] mdt_handle_common+0x8d5/0x1810 [mdt]
       [<ffffffffa0555104>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
       [<ffffffffa09f3bc5>] mdt_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa0565c7e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc]
       [<ffffffffa05650f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
       [<ffffffff8100c1ca>] child_rip+0xa/0x20
       [<ffffffffa05650f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
       [<ffffffffa05650f0>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
       [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
      
      LustreError: dumping log to /tmp/lustre-log.1314853064.8011
      

      Maloo report: https://maloo.whamcloud.com/test_sets/7ce0bc08-d45e-11e0-8d02-52540025f9af

      This could be reproduced easily by just running 'REFORMAT="--reformat" bash sanity.sh' on a single node.

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: