Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20206

quota crash on interop with 2.15 ASSERTION( lqe->u.se.lse_pending_write == 0 )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • Lustre 2.18.0
    • Lustre 2.18.0
    • None
    • 3
    • 9223372036854775807

    Description

      Sometime in March we started to have large-scale tets 3a crashes on interop with 2.15 servers:

      [ 6991.492050] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == large-scale test 3a: recovery time, 2 clients ========= 03:12:04 \(1777518724\)
      [ 6991.731468] Lustre: DEBUG MARKER: == large-scale test 3a: recovery time, 2 clients ========= 03:12:04 (1777518724)
      [ 7067.178608] Lustre: DEBUG MARKER: /usr/sbin/lctl mark 1 : Starting failover on mds1
      [ 7067.685548] Lustre: DEBUG MARKER: 1 : Starting failover on mds1
      [ 7068.172506] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
      [ 7068.454029] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
      [ 7068.596718] Lustre: Failing over lustre-MDT0000
      [ 7068.778816] LustreError: 534424:0:(lquota_entry.c:119:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed:
      [ 7068.780650] LustreError: 534424:0:(lquota_entry.c:119:lqe_iter_cb()) LBUG
      [ 7068.781739] CPU: 0 PID: 534424 Comm: umount Kdump: loaded Tainted: G           OE     -------- -  - 4.18.0-553.117.1.el8_lustre.x86_64 #1
      [ 7068.783568] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
      [ 7068.784606] Call Trace:
      [ 7068.785049]  dump_stack+0x41/0x60
      [ 7068.785573]  lbug_with_loc.cold.7+0x5/0x43 [libcfs]
      [ 7068.786322]  lqe_iter_cb+0x133/0x140 [lquota]
      [ 7068.787005]  cfs_hash_for_each_tight+0x122/0x310 [obdclass]
      [ 7068.787908]  lquota_site_free+0xfe/0x2a0 [lquota]
      [ 7068.788628]  qsd_qtype_fini+0x352/0x460 [lquota]
      [ 7068.789343]  qsd_fini+0x1fe/0x420 [lquota]
      [ 7068.789984]  osd_shutdown+0x42/0x110 [osd_ldiskfs]
      [ 7068.790736]  osd_process_config+0x32e/0x380 [osd_ldiskfs]
      [ 7068.791564]  lod_process_config+0x205/0xe80 [lod]
      [ 7068.792298]  ? srso_alias_return_thunk+0x5/0xfcdfd
      [ 7068.793027]  ? wait_for_completion+0xd2/0x100
      [ 7068.793694]  mdd_process_config+0x201/0x660 [mdd]
      [ 7068.794439]  mdt_stack_fini+0x351/0xa80 [mdt]
      [ 7068.795159]  mdt_device_fini+0xa3e/0xe90 [mdt]
      [ 7068.795869]  obd_precleanup.isra.33+0x8e/0x280 [obdclass]
      [ 7068.796783]  ? srso_alias_return_thunk+0x5/0xfcdfd
      [ 7068.797704]  ? class_disconnect_exports+0x197/0x300 [obdclass]
      [ 7068.798867]  class_cleanup+0x32b/0x7e0 [obdclass]
      [ 7068.799816]  class_process_config+0x3bb/0x20a0 [obdclass]
      [ 7068.800899]  class_manual_cleanup+0x2ab/0x780 [obdclass]
      [ 7068.801744]  ? srso_alias_return_thunk+0x5/0xfcdfd
      [ 7068.802570]  server_put_super+0xdb5/0x14a0 [ptlrpc]
      [ 7068.803728]  ? srso_alias_return_thunk+0x5/0xfcdfd
      [ 7068.804581]  ? srso_alias_return_thunk+0x5/0xfcdfd
      [ 7068.805320]  ? fsnotify_sb_delete+0x138/0x1c0
      [ 7068.806003]  generic_shutdown_super+0x6c/0x110
      [ 7068.806873]  kill_anon_super+0x14/0x30
      [ 7068.807570]  deactivate_locked_super+0x34/0x70
      [ 7068.808353]  cleanup_mnt+0x3b/0x70
      [ 7068.808903]  task_work_run+0x8a/0xb0
      [ 7068.809462]  exit_to_usermode_loop+0xf4/0x100
      [ 7068.810286]  do_syscall_64+0x1cb/0x1d0
      [ 7068.811014]  entry_SYSCALL_64_after_hwframe+0x66/0xcb
      [ 7068.811946] RIP: 0033:0x7fa7efea68fb
      [ 7068.812578] Code: ff d0 48 89 c7 b8 3c 00 00 00 0f 05 48 8b 0d 84 65 39 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 65 39 00 f7 d8 64 89 01 48
      [ 7068.815959] RSP: 002b:00007ffe7abe6498 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [ 7068.817303] RAX: 0000000000000000 RBX: 0000558221e369c0 RCX: 00007fa7efea68fb
      [ 7068.818500] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000558221e3c3c0
      [ 7068.819746] RBP: 0000000000000000 R08: 0000558221e3cf20 R09: 0000558221e31010
      [ 7068.821055] R10: 0000000000000000 R11: 0000000000000246 R12: 0000558221e3c3c0
      [ 7068.822378] R13: 00007fa7f0d29184 R14: 0000558221e36ba0 R15: 00000000ffffffff
      [ 7068.823764] Kernel panic - not syncing: LBUG 

      Example failures:

      (first ever recorded): https://testing.whamcloud.com/test_sets/2c90a510-d83e-4017-9f7d-ff2cb78ea315

      https://testing.whamcloud.com/test_sets/435eee2f-daf8-449c-973f-64473d2d65a1

      https://testing.whamcloud.com/test_sets/3edd37f5-669e-4fb9-8471-af7692a993bb

      https://testing.whamcloud.com/test_sets/c1d24572-e40d-4acd-b78f-174783b2dec2

      https://testing.whamcloud.com/test_sets/ca3ff105-53ae-4901-8ceb-26367a0db604

      https://testing.whamcloud.com/test_sets/bb0699b0-87c8-4d6f-ab55-f6b74c017d45

      https://testing.whamcloud.com/test_sets/8552ec19-5d4a-4baa-912c-94e0373451ed

      Attachments

        Activity

          People

            hongchao.zhang Hongchao Zhang
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: