Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10307

obdfilter-survey test_3a: Timeout occurred after 456 mins, last suite running was obdfilter-survey, restarting cluster to continue tests

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.10.2
    • None
    • onyx, interop
      servers: el7.4, ldiskfs, branch b2_10, v2.10.2.RC1, b50
      clients: el7.3, branch b2_9, v2.9.0, b22
    • 3
    • 9223372036854775807

    Description

      session: https://testing.hpdd.intel.com/test_sessions/9f032a71-4161-4ba2-aee0-78e2895d8180
      test set: https://testing.hpdd.intel.com/test_sets/d3d64c10-d43b-11e7-9c63-52540065bddc

      obdfilter-survey test 3a hangs on OST umount. The last thing we see in the client test_log for test 3a is unmounting OST1 on onyx-34vm8.

      From the OST dmesg log, we see

      [24240.212638] INFO: task umount:26551 blocked for more than 120 seconds.
      [24240.213441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [24240.214247] umount          D 000000000000f908     0 26551  26550 0x00000080
      [24240.215027]  ffff880052e93ab0 0000000000000086 ffff8800128fdee0 ffff880052e93fd8
      [24240.215949]  ffff880052e93fd8 ffff880052e93fd8 ffff8800128fdee0 ffff88007c100000
      [24240.216808]  ffff880052e93ae0 00000001016e57b8 ffff88007c100000 000000000000f908
      [24240.217726] Call Trace:
      [24240.218021]  [<ffffffff816a9569>] schedule+0x29/0x70
      [24240.218527]  [<ffffffff816a6fb4>] schedule_timeout+0x174/0x2c0
      [24240.219151]  [<ffffffff81098b30>] ? internal_add_timer+0x70/0x70
      [24240.220024]  [<ffffffffc0775343>] ? dump_exports+0x143/0x150 [obdclass]
      [24240.220725]  [<ffffffffc07753fb>] obd_exports_barrier+0xab/0x1a0 [obdclass]
      [24240.221464]  [<ffffffffc0ff16bf>] ofd_device_fini+0x8f/0x2d0 [ofd]
      [24240.222134]  [<ffffffffc078d911>] class_cleanup+0x971/0xcd0 [obdclass]
      [24240.222818]  [<ffffffffc078fcad>] class_process_config+0x19cd/0x23b0 [obdclass]
      [24240.223579]  [<ffffffffc0637bc7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [24240.224275]  [<ffffffffc0790856>] class_manual_cleanup+0x1c6/0x710 [obdclass]
      [24240.225044]  [<ffffffffc07befee>] server_put_super+0x8de/0xcd0 [obdclass]
      [24240.225993]  [<ffffffff81203692>] generic_shutdown_super+0x72/0x100
      [24240.226718]  [<ffffffff81203a62>] kill_anon_super+0x12/0x20
      [24240.227416]  [<ffffffffc0793152>] lustre_kill_super+0x32/0x50 [obdclass]
      [24240.228111]  [<ffffffff81203e19>] deactivate_locked_super+0x49/0x60
      [24240.228805]  [<ffffffff81204586>] deactivate_super+0x46/0x60
      [24240.229413]  [<ffffffff812217cf>] cleanup_mnt+0x3f/0x80
      [24240.229957]  [<ffffffff81221862>] __cleanup_mnt+0x12/0x20
      [24240.230566]  [<ffffffff810ad275>] task_work_run+0xc5/0xf0
      [24240.231138]  [<ffffffff8102ab62>] do_notify_resume+0x92/0xb0
      [24240.231779]  [<ffffffff816b533d>] int_signal+0x12/0x17
      

      Attachments

        Issue Links

          Activity

            People

              ys Yang Sheng
              jcasper James Casper
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: