Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2615

group of OSS crashed at umount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • Lustre 2.1.3
    • 3
    • 6118

    Description

      We have got 4 OSSes that crash at the same time, at umount, with the following bt :

      PID: 18173 TASK: ffff8803376dc040 CPU: 4 COMMAND: "umount"
      #0 [ffff8802b115f8d0] machine_kexec at ffffffff8102895b
      0000001 [ffff8802b115f930] crash_kexec at ffffffff810a4622
      0000002 [ffff8802b115fa00] panic at ffffffff81484657
      0000003 [ffff8802b115fa80] lbug_with_loc at ffffffffa04ade5b [libcfs]
      0000004 [ffff8802b115faa0] llog_recov_thread_stop at ffffffffa072e55b [ptlrpc]
      0000005 [ffff8802b115fad0] llog_recov_thread_fini at ffffffffa072e593 [ptlrpc]
      0000006 [ffff8802b115faf0] filter_llog_finish at ffffffffa0c7d3dd [obdfilter]
      0000007 [ffff8802b115fb20] obd_llog_finish at ffffffffa057c2f8 [obdclass]
      0000008 [ffff8802b115fb40] filter_precleanup at ffffffffa0c7cdaf [obdfilter]
      0000009 [ffff8802b115fba0] class_cleanup at ffffffffa05a3ca7 [obdclass]
      0000010 [ffff8802b115fc20] class_process_config at ffffffffa05a5feb [obdclass]
      0000011 [ffff8802b115fcb0] class_manual_cleanup at ffffffffa05a6d29 [obdclass]
      0000012 [ffff8802b115fd70] server_put_super at ffffffffa05b2c0c [obdclass]
      0000013 [ffff8802b115fe40] generic_shutdown_super at ffffffff8116542b
      0000014 [ffff8802b115fe60] kill_anon_super at ffffffff81165546
      0000015 [ffff8802b115fe80] lustre_kill_super at ffffffffa05a8966 [obdclass]
      0000016 [ffff8802b115fea0] deactivate_super at ffffffff811664e0
      0000017 [ffff8802b115fec0] mntput_no_expire at ffffffff811826bf
      0000018 [ffff8802b115fef0] sys_umount at ffffffff81183188
      0000019 [ffff8802b115ff80] system_call_fastpath at ffffffff810030f2
      RIP: 00007f62ddfbdd67 RSP: 00007fffab738308 RFLAGS: 00010202
      RAX: 00000000000000a6 RBX: ffffffff810030f2 RCX: 0000000000000010
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f62deeb3bb0
      RBP: 00007f62deeb3b80 R8: 00007f62deeb3bd0 R9: 0000000000000000
      R10: 00007fffab738130 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 00007f62deeb3c10
      ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b

      This bt is identical as the one shown LU-1194 which is supposed to be fixed in 2.1.3.

      Site is classified so I can't upload the binary crash but I can export the content of some structures upon request.

      Attachments

        1. ptlrpcd.c
          32 kB
        2. recov_thread.c
          24 kB

        Activity

          People

            hongchao.zhang Hongchao Zhang
            louveta Alexandre Louvet (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: