Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3326

recovery-mds-scale test_failover_ost: tar: Cannot open: No space left on device

    XMLWordPrintable

Details

    • 3
    • 8208

    Description

      After running recovery-mds-scale test_failover_ost for 1.5 hours (OST failed over 6 times), client load on one of the clients failed as follows:

      <snip>
      tar: etc/mail/submit.cf: Cannot open: No space left on device
      tar: etc/mail/trusted-users: Cannot open: No space left on device
      tar: etc/mail/virtusertable: Cannot open: No space left on device
      tar: etc/mail/access: Cannot open: No space left on device
      tar: etc/mail/aliasesdb-stamp: Cannot open: No space left on device
      tar: etc/gssapi_mech.conf: Cannot open: No space left on device
      tar: Exiting with failure status due to previous errors
      

      Console log on the client (client-32vm6) showed that:

      19:40:31:INFO: task tar:2790 blocked for more than 120 seconds.
      19:40:31:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      19:40:31:tar           D 0000000000000000     0  2790   2788 0x00000080
      19:40:31: ffff88004eb73a28 0000000000000082 ffff88004eb739d8 ffff88007c24fe50
      19:40:31: 0000000000000286 0000000000000003 0000000000000001 0000000000000286
      19:40:31: ffff88007bcb3ab8 ffff88004eb73fd8 000000000000fb88 ffff88007bcb3ab8
      19:40:31:Call Trace:
      19:40:31: [<ffffffffa03d775a>] ? cfs_waitq_signal+0x1a/0x20 [libcfs]
      19:40:31: [<ffffffff8150ea05>] schedule_timeout+0x215/0x2e0
      19:40:31: [<ffffffffa068517c>] ? ptlrpc_request_bufs_pack+0x5c/0x80 [ptlrpc]
      19:40:31: [<ffffffffa069a770>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
      19:40:31: [<ffffffff8150e683>] wait_for_common+0x123/0x180
      19:40:31: [<ffffffff81063310>] ? default_wake_function+0x0/0x20
      19:40:31: [<ffffffff8150e79d>] wait_for_completion+0x1d/0x20
      19:40:31: [<ffffffffa08cbf6c>] osc_io_setattr_end+0xbc/0x190 [osc]
      19:40:31: [<ffffffffa095cde0>] ? lov_io_end_wrapper+0x0/0x100 [lov]
      19:40:31: [<ffffffffa055cf30>] cl_io_end+0x60/0x150 [obdclass]
      19:40:31: [<ffffffffa055d7e0>] ? cl_io_start+0x0/0x140 [obdclass]
      19:40:31: [<ffffffffa095ced1>] lov_io_end_wrapper+0xf1/0x100 [lov]
      19:40:31: [<ffffffffa095c86e>] lov_io_call+0x8e/0x130 [lov]
      19:40:31: [<ffffffffa095e3bc>] lov_io_end+0x4c/0xf0 [lov]
      19:40:31: [<ffffffffa055cf30>] cl_io_end+0x60/0x150 [obdclass]
      19:40:31: [<ffffffffa0561f92>] cl_io_loop+0xc2/0x1b0 [obdclass]
      19:40:31: [<ffffffffa0a2aa08>] cl_setattr_ost+0x208/0x2c0 [lustre]
      19:40:31: [<ffffffffa09f8b0e>] ll_setattr_raw+0x9ce/0x1000 [lustre]
      19:40:31: [<ffffffffa09f919b>] ll_setattr+0x5b/0xf0 [lustre]
      19:40:31: [<ffffffff8119e708>] notify_change+0x168/0x340
      19:40:31: [<ffffffff811b284c>] utimes_common+0xdc/0x1b0
      19:40:31: [<ffffffff811828d1>] ? __fput+0x1a1/0x210
      19:40:31: [<ffffffff811b29fe>] do_utimes+0xde/0xf0
      19:40:31: [<ffffffff811b2b12>] sys_utimensat+0x32/0x90
      19:40:31: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Maloo report: https://maloo.whamcloud.com/test_sets/053120d2-bb19-11e2-8824-52540035b04c

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: