Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1121

recovery-mds-scale (FLAVOR=OSS): tar: Wrote only 4096 of 7168 bytes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • None
    • Lustre 2.1.1, Lustre 2.1.3
    • None
    • 3
    • 3993

    Description

      While running recovery-mds-scale with FLAVOR=OSS, it failed as follows:

      ==== Checking the clients loads AFTER  failover -- failure NOT OK
      ost3 has failed over 1 times, and counting...
      sleeping 582 seconds ... 
      tar: etc/selinux/targeted/modules/active/modules/sandbox.pp: Wrote only 4096 of 7168 bytes
      tar: Exiting with failure status due to previous errors
      Found the END_RUN_FILE file: /home/yujian/test_logs/end_run_file
      client-1-ib
      Client load failed on node client-1-ib
      
      client client-1-ib load stdout and debug files :
                    /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib
                    /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib.debug
      

      /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib:

      tar: etc/selinux/targeted/modules/active/modules/sandbox.pp: Wrote only 4096 of 7168 bytes
      tar: Exiting with failure status due to previous errors
      

      /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib.debug:

      <~snip~>
      2012-02-18 22:30:41: tar run starting
      + mkdir -p /mnt/lustre/d0.tar-client-1-ib
      + cd /mnt/lustre/d0.tar-client-1-ib
      + wait 7567
      + do_tar
      + tar cf - /etc
      + tar xf -
      + tee /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib
      tar: Removing leading `/' from member names
      + return 2
      + RC=2
      ++ grep 'exit delayed from previous errors' /tmp/recovery-mds-scale.log_run_tar.sh-client-1-ib
      + PREV_ERRORS=
      + true
      + '[' 2 -ne 0 -a '' -a '' ']'
      + '[' 2 -eq 0 ']'
      ++ date '+%F %H:%M:%S'
      + echoerr '2012-02-18 22:37:10: tar failed'
      + echo '2012-02-18 22:37:10: tar failed'
      2012-02-18 22:37:10: tar failed
      <~snip~>
      

      Syslog on client node client-1-ib showed that:

      Feb 18 22:34:54 client-1 kernel: INFO: task flush-lustre-1:3510 blocked for more than 120 seconds.
      Feb 18 22:34:54 client-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Feb 18 22:34:54 client-1 kernel: flush-lustre- D 0000000000000000     0  3510      2 0x00000080
      Feb 18 22:34:54 client-1 kernel: ffff8801f70e99a0 0000000000000046 ffff8801f70e9920 ffffffffa0942434
      Feb 18 22:34:54 client-1 kernel: 0000000000000000 ffff880331d24980 ffff8801f70e9930 0000000000000000
      Feb 18 22:34:54 client-1 kernel: ffff88027d12b0b8 ffff8801f70e9fd8 000000000000f4e8 ffff88027d12b0b8
      Feb 18 22:34:54 client-1 kernel: Call Trace:
      Feb 18 22:34:54 client-1 kernel: [<ffffffffa0942434>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs]
      Feb 18 22:34:54 client-1 kernel: [<ffffffff8109b809>] ? ktime_get_ts+0xa9/0xe0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81110b10>] ? sync_page+0x0/0x50
      Feb 18 22:34:54 client-1 kernel: [<ffffffff814ed1c3>] io_schedule+0x73/0xc0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81110b4d>] sync_page+0x3d/0x50
      Feb 18 22:34:54 client-1 kernel: [<ffffffff814eda2a>] __wait_on_bit_lock+0x5a/0xc0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81110ae7>] __lock_page+0x67/0x70
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81090c30>] ? wake_bit_function+0x0/0x50
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81124c97>] ? __writepage+0x17/0x40
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811261f2>] write_cache_pages+0x392/0x4a0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81052600>] ? __dequeue_entity+0x30/0x50
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81124c80>] ? __writepage+0x0/0x40
      Feb 18 22:34:54 client-1 kernel: [<ffffffff8126a5c9>] ? cpumask_next_and+0x29/0x50
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81054754>] ? find_busiest_group+0x244/0xb20
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81126324>] generic_writepages+0x24/0x30
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81126351>] do_writepages+0x21/0x40
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811a046d>] writeback_single_inode+0xdd/0x2c0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811a08ae>] writeback_sb_inodes+0xce/0x180
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811a0a0b>] writeback_inodes_wb+0xab/0x1b0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811a0dab>] wb_writeback+0x29b/0x3f0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff814eca20>] ? thread_return+0x4e/0x77e
      Feb 18 22:34:54 client-1 kernel: [<ffffffff8107cc02>] ? del_timer_sync+0x22/0x30
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811a1099>] wb_do_writeback+0x199/0x240
      Feb 18 22:34:54 client-1 kernel: [<ffffffff811a11a3>] bdi_writeback_task+0x63/0x1b0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81090ab7>] ? bit_waitqueue+0x17/0xd0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81134d40>] ? bdi_start_fn+0x0/0x100
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81134dc6>] bdi_start_fn+0x86/0x100
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81134d40>] ? bdi_start_fn+0x0/0x100
      Feb 18 22:34:54 client-1 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
      Feb 18 22:34:54 client-1 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
      Feb 18 22:34:54 client-1 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Maloo report: https://maloo.whamcloud.com/test_sets/f3b4fe94-5af9-11e1-8801-5254004bbbd3

      Please refer to the attached recovery-oss-scale.1329633991.log.tar.bz2 for more logs.

      It seems this is issue LU-874.

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: