Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18850

sanity/900 Crash with cfs_hash_for_each_relax+0x17b/0x480 [obdclass]

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/111944 - 4.18.0-553.44.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/111944 - 4.18.0-553.44.1.el8_lustre.x86_64

      Crashes executing sanity test 900 during umount:

      [17722.241646] LustreError: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail
      [17727.828813] Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping)
      [17729.469790] Lustre: lustre-MDT0001: Not available for connect from 10.240.24.216@tcp (stopping)
      [17729.471698] Lustre: Skipped 7 previous similar messages
      :
      [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
      [17772.371426] CPU: 1 PID: 573821 Comm: umount 4.18.0-553.44.1.el8_lustre.x86_64 #1
      [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
      [17773.230656] Call Trace:
      [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
      [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
      [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
      [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
      [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
      [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
      [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
      [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
      [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
      [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
      [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
      [17775.514029] class_cleanup+0x322/0x900 [obdclass]
      [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
      [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
      [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
      [17775.578275] generic_shutdown_super+0x6c/0x110
      [17775.579220] kill_anon_super+0x14/0x30
      [17775.580050] deactivate_locked_super+0x34/0x70
      [17775.581003] cleanup_mnt+0x3b/0x70
      [17775.581767] task_work_run+0x8a/0xb0
      [17775.582579] exit_to_usermode_loop+0xef/0x100
      [17775.583529] do_syscall_64+0x195/0x1a0
      [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb

      Attachments

        Issue Links

          Activity

            [LU-18850] sanity/900 Crash with cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-8050 [ EX-8050 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-18593 [ LU-18593 ]
            arshad512 Arshad Hussain made changes -
            Assignee Original: WC Triage [ wc-triage ] New: Arshad Hussain [ arshad512 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to EX-1913 [ EX-1913 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-8792 [ LU-8792 ]
            adilger Andreas Dilger made changes -
            Description Original: This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

            This issue relates to the following test suite run: [https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012]

            Test session details:
            clients: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_10.x86_64
            servers: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_lustre.x86_64

            +*Crashes executing sanity test 900 during umount:*+
            {noformat}
            [17722.241646] LustreError: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail
            [17727.828813] Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping)
            [17729.469790] Lustre: lustre-MDT0001: Not available for connect from 10.240.24.216@tcp (stopping)
            [17729.471698] Lustre: Skipped 7 previous similar messages
            :
            [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
            [17772.108981] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs intel_rapl_msr lockd grace intel_rapl_common fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]
            [17772.371426] CPU: 1 PID: 573821 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1
            [17772.521037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17773.214742] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be ae 99 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
            [17773.218264] RSP: 0018:ffffbfdd44dfb9f8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
            [17773.219722] RAX: ffffbfdd433f8008 RBX: 0000000000000000 RCX: 000000000000000e
            [17773.221092] RDX: ffffbfdd433d9000 RSI: ffffbfdd44dfba30 RDI: ffff97c2fa9d9600
            [17773.222465] RBP: ffffffffc0eadde0 R08: 00000000000007b9 R09: 000000000000000e
            [17773.223836] R10: ffff97c2fa1a5000 R11: ffff97c2fa1a4554 R12: ffffbfdd44dfbaa8
            [17773.225211] R13: 0000000000000000 R14: ffff97c2fa9d9600 R15: 0000000000000000
            [17773.226598] FS: 00007f18a889b080(0000) GS:ffff97c37fd00000(0000) knlGS:0000000000000000
            [17773.228142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [17773.229278] CR2: 00005591f490e668 CR3: 000000003497c003 CR4: 00000000001706e0
            [17773.230656] Call Trace:
            [17773.297324] <IRQ>
            [17774.001000] ? watchdog_timer_fn.cold.10+0x46/0x9e
            [17774.037843] ? watchdog+0x30/0x30
            [17774.038589] ? __hrtimer_run_queues+0x101/0x280
            [17774.039524] ? hrtimer_interrupt+0x100/0x220
            [17774.040386] ? smp_apic_timer_interrupt+0x6a/0x130
            [17774.041370] ? apic_timer_interrupt+0xf/0x20
            [17774.042216] </IRQ>
            [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
            [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
            [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
            [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
            [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
            [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
            [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
            [17775.512833] ? class_disconnect_exports+0x187/0x2f0 [obdclass]
            [17775.514029] class_cleanup+0x322/0x900 [obdclass]
            [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
            [17775.516163] ? class_manual_cleanup+0x191/0x780 [obdclass]
            [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
            [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
            [17775.577312] ? fsnotify_sb_delete+0x138/0x1c0
            [17775.578275] generic_shutdown_super+0x6c/0x110
            [17775.579220] kill_anon_super+0x14/0x30
            [17775.580050] deactivate_locked_super+0x34/0x70
            [17775.581003] cleanup_mnt+0x3b/0x70
            [17775.581767] task_work_run+0x8a/0xb0
            [17775.582579] exit_to_usermode_loop+0xef/0x100
            [17775.583529] do_syscall_64+0x195/0x1a0
            [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb{noformat}
            New: This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

            This issue relates to the following test suite run: [https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012]

            Test session details:
            clients: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_10.x86_64
            servers: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_lustre.x86_64

            +*Crashes executing sanity test 900 during umount:*+
            {noformat}
            [17722.241646] LustreError: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail
            [17727.828813] Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping)
            [17729.469790] Lustre: lustre-MDT0001: Not available for connect from 10.240.24.216@tcp (stopping)
            [17729.471698] Lustre: Skipped 7 previous similar messages
            :
            [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
            [17772.371426] CPU: 1 PID: 573821 Comm: umount 4.18.0-553.44.1.el8_lustre.x86_64 #1
            [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17773.230656] Call Trace:
            [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
            [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
            [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
            [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
            [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
            [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
            [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
            [17775.514029] class_cleanup+0x322/0x900 [obdclass]
            [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
            [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
            [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
            [17775.578275] generic_shutdown_super+0x6c/0x110
            [17775.579220] kill_anon_super+0x14/0x30
            [17775.580050] deactivate_locked_super+0x34/0x70
            [17775.581003] cleanup_mnt+0x3b/0x70
            [17775.581767] task_work_run+0x8a/0xb0
            [17775.582579] exit_to_usermode_loop+0xef/0x100
            [17775.583529] do_syscall_64+0x195/0x1a0
            [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb{noformat}
            adilger Andreas Dilger made changes -
            Description Original: This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

            This issue relates to the following test suite run: [https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012]

            Test session details:
            clients: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_10.x86_64
            servers: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_lustre.x86_64

            +*Crashes executing sanity test 900 during umount:*+
            {noformat}
            [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
            [17772.108981] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs intel_rapl_msr lockd grace intel_rapl_common fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]
            [17772.371426] CPU: 1 PID: 573821 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1
            [17772.521037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17773.214742] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be ae 99 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
            [17773.218264] RSP: 0018:ffffbfdd44dfb9f8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
            [17773.219722] RAX: ffffbfdd433f8008 RBX: 0000000000000000 RCX: 000000000000000e
            [17773.221092] RDX: ffffbfdd433d9000 RSI: ffffbfdd44dfba30 RDI: ffff97c2fa9d9600
            [17773.222465] RBP: ffffffffc0eadde0 R08: 00000000000007b9 R09: 000000000000000e
            [17773.223836] R10: ffff97c2fa1a5000 R11: ffff97c2fa1a4554 R12: ffffbfdd44dfbaa8
            [17773.225211] R13: 0000000000000000 R14: ffff97c2fa9d9600 R15: 0000000000000000
            [17773.226598] FS: 00007f18a889b080(0000) GS:ffff97c37fd00000(0000) knlGS:0000000000000000
            [17773.228142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [17773.229278] CR2: 00005591f490e668 CR3: 000000003497c003 CR4: 00000000001706e0
            [17773.230656] Call Trace:
            [17773.297324] <IRQ>
            [17774.001000] ? watchdog_timer_fn.cold.10+0x46/0x9e
            [17774.037843] ? watchdog+0x30/0x30
            [17774.038589] ? __hrtimer_run_queues+0x101/0x280
            [17774.039524] ? hrtimer_interrupt+0x100/0x220
            [17774.040386] ? smp_apic_timer_interrupt+0x6a/0x130
            [17774.041370] ? apic_timer_interrupt+0xf/0x20
            [17774.042216] </IRQ>
            [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
            [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
            [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
            [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
            [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
            [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
            [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
            [17775.512833] ? class_disconnect_exports+0x187/0x2f0 [obdclass]
            [17775.514029] class_cleanup+0x322/0x900 [obdclass]
            [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
            [17775.516163] ? class_manual_cleanup+0x191/0x780 [obdclass]
            [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
            [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
            [17775.577312] ? fsnotify_sb_delete+0x138/0x1c0
            [17775.578275] generic_shutdown_super+0x6c/0x110
            [17775.579220] kill_anon_super+0x14/0x30
            [17775.580050] deactivate_locked_super+0x34/0x70
            [17775.581003] cleanup_mnt+0x3b/0x70
            [17775.581767] task_work_run+0x8a/0xb0
            [17775.582579] exit_to_usermode_loop+0xef/0x100
            [17775.583529] do_syscall_64+0x195/0x1a0
            [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb{noformat}
            New: This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

            This issue relates to the following test suite run: [https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012]

            Test session details:
            clients: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_10.x86_64
            servers: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_lustre.x86_64

            +*Crashes executing sanity test 900 during umount:*+
            {noformat}
            [17722.241646] LustreError: MGC10.240.28.46@tcp: Connection to MGS (at 10.240.28.46@tcp) was lost; in progress operations using this service will fail
            [17727.828813] Lustre: lustre-MDT0001: Not available for connect from 10.240.28.46@tcp (stopping)
            [17729.469790] Lustre: lustre-MDT0001: Not available for connect from 10.240.24.216@tcp (stopping)
            [17729.471698] Lustre: Skipped 7 previous similar messages
            :
            [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
            [17772.108981] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs intel_rapl_msr lockd grace intel_rapl_common fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]
            [17772.371426] CPU: 1 PID: 573821 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1
            [17772.521037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17773.214742] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be ae 99 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
            [17773.218264] RSP: 0018:ffffbfdd44dfb9f8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
            [17773.219722] RAX: ffffbfdd433f8008 RBX: 0000000000000000 RCX: 000000000000000e
            [17773.221092] RDX: ffffbfdd433d9000 RSI: ffffbfdd44dfba30 RDI: ffff97c2fa9d9600
            [17773.222465] RBP: ffffffffc0eadde0 R08: 00000000000007b9 R09: 000000000000000e
            [17773.223836] R10: ffff97c2fa1a5000 R11: ffff97c2fa1a4554 R12: ffffbfdd44dfbaa8
            [17773.225211] R13: 0000000000000000 R14: ffff97c2fa9d9600 R15: 0000000000000000
            [17773.226598] FS: 00007f18a889b080(0000) GS:ffff97c37fd00000(0000) knlGS:0000000000000000
            [17773.228142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [17773.229278] CR2: 00005591f490e668 CR3: 000000003497c003 CR4: 00000000001706e0
            [17773.230656] Call Trace:
            [17773.297324] <IRQ>
            [17774.001000] ? watchdog_timer_fn.cold.10+0x46/0x9e
            [17774.037843] ? watchdog+0x30/0x30
            [17774.038589] ? __hrtimer_run_queues+0x101/0x280
            [17774.039524] ? hrtimer_interrupt+0x100/0x220
            [17774.040386] ? smp_apic_timer_interrupt+0x6a/0x130
            [17774.041370] ? apic_timer_interrupt+0xf/0x20
            [17774.042216] </IRQ>
            [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
            [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
            [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
            [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
            [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
            [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
            [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
            [17775.512833] ? class_disconnect_exports+0x187/0x2f0 [obdclass]
            [17775.514029] class_cleanup+0x322/0x900 [obdclass]
            [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
            [17775.516163] ? class_manual_cleanup+0x191/0x780 [obdclass]
            [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
            [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
            [17775.577312] ? fsnotify_sb_delete+0x138/0x1c0
            [17775.578275] generic_shutdown_super+0x6c/0x110
            [17775.579220] kill_anon_super+0x14/0x30
            [17775.580050] deactivate_locked_super+0x34/0x70
            [17775.581003] cleanup_mnt+0x3b/0x70
            [17775.581767] task_work_run+0x8a/0xb0
            [17775.582579] exit_to_usermode_loop+0xef/0x100
            [17775.583529] do_syscall_64+0x195/0x1a0
            [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb{noformat}
            arshad512 Arshad Hussain made changes -
            Description Original: This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

            This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012

            Test session details:
            clients: https://build.whamcloud.com/job/lustre-reviews/111944 - 4.18.0-553.44.1.el8_10.x86_64
            servers: https://build.whamcloud.com/job/lustre-reviews/111944 - 4.18.0-553.44.1.el8_lustre.x86_64

            Crashes executing sanity test 900 during umount:

            [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
            [17772.108981] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs intel_rapl_msr lockd grace intel_rapl_common fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]
            [17772.371426] CPU: 1 PID: 573821 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1
            [17772.521037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17773.214742] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be ae 99 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
            [17773.218264] RSP: 0018:ffffbfdd44dfb9f8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
            [17773.219722] RAX: ffffbfdd433f8008 RBX: 0000000000000000 RCX: 000000000000000e
            [17773.221092] RDX: ffffbfdd433d9000 RSI: ffffbfdd44dfba30 RDI: ffff97c2fa9d9600
            [17773.222465] RBP: ffffffffc0eadde0 R08: 00000000000007b9 R09: 000000000000000e
            [17773.223836] R10: ffff97c2fa1a5000 R11: ffff97c2fa1a4554 R12: ffffbfdd44dfbaa8
            [17773.225211] R13: 0000000000000000 R14: ffff97c2fa9d9600 R15: 0000000000000000
            [17773.226598] FS: 00007f18a889b080(0000) GS:ffff97c37fd00000(0000) knlGS:0000000000000000
            [17773.228142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [17773.229278] CR2: 00005591f490e668 CR3: 000000003497c003 CR4: 00000000001706e0
            [17773.230656] Call Trace:
            [17773.297324] <IRQ>
            [17774.001000] ? watchdog_timer_fn.cold.10+0x46/0x9e
            [17774.037843] ? watchdog+0x30/0x30
            [17774.038589] ? __hrtimer_run_queues+0x101/0x280
            [17774.039524] ? hrtimer_interrupt+0x100/0x220
            [17774.040386] ? smp_apic_timer_interrupt+0x6a/0x130
            [17774.041370] ? apic_timer_interrupt+0xf/0x20
            [17774.042216] </IRQ>
            [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
            [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
            [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
            [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
            [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
            [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
            [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
            [17775.512833] ? class_disconnect_exports+0x187/0x2f0 [obdclass]
            [17775.514029] class_cleanup+0x322/0x900 [obdclass]
            [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
            [17775.516163] ? class_manual_cleanup+0x191/0x780 [obdclass]
            [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
            [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
            [17775.577312] ? fsnotify_sb_delete+0x138/0x1c0
            [17775.578275] generic_shutdown_super+0x6c/0x110
            [17775.579220] kill_anon_super+0x14/0x30
            [17775.580050] deactivate_locked_super+0x34/0x70
            [17775.581003] cleanup_mnt+0x3b/0x70
            [17775.581767] task_work_run+0x8a/0xb0
            [17775.582579] exit_to_usermode_loop+0xef/0x100
            [17775.583529] do_syscall_64+0x195/0x1a0
            [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb

            New: This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

            This issue relates to the following test suite run: [https://testing.whamcloud.com/test_sets/aec6b1c3-5f88-425a-8bf5-50415f59f012]

            Test session details:
            clients: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_10.x86_64
            servers: [https://build.whamcloud.com/job/lustre-reviews/111944] - 4.18.0-553.44.1.el8_lustre.x86_64

            +*Crashes executing sanity test 900 during umount:*+
            {noformat}
            [17771.667290] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [umount:573821]
            [17772.108981] Modules linked in: dm_flakey obdecho(OE) ptlrpc_gss(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs intel_rapl_msr lockd grace intel_rapl_common fscache crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ext4 mbcache jbd2 ata_generic crc32c_intel virtio_net ata_piix libata serio_raw net_failover failover virtio_blk [last unloaded: dm_flakey]
            [17772.371426] CPU: 1 PID: 573821 Comm: umount Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.44.1.el8_lustre.x86_64 #1
            [17772.521037] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [17772.522202] RIP: 0010:cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17773.214742] Code: 24 40 00 00 00 00 8b 40 2c 89 44 24 20 49 8b 46 38 48 8d 74 24 38 4c 89 f7 48 8b 00 e8 be ae 99 d0 48 85 c0 0f 84 f1 01 00 00 <48> 8b 18 48 85 db 0f 84 cb 01 00 00 49 8b 46 28 48 89 de 4c 89 f7
            [17773.218264] RSP: 0018:ffffbfdd44dfb9f8 EFLAGS: 00010282 ORIG_RAX: ffffffffffffff13
            [17773.219722] RAX: ffffbfdd433f8008 RBX: 0000000000000000 RCX: 000000000000000e
            [17773.221092] RDX: ffffbfdd433d9000 RSI: ffffbfdd44dfba30 RDI: ffff97c2fa9d9600
            [17773.222465] RBP: ffffffffc0eadde0 R08: 00000000000007b9 R09: 000000000000000e
            [17773.223836] R10: ffff97c2fa1a5000 R11: ffff97c2fa1a4554 R12: ffffbfdd44dfbaa8
            [17773.225211] R13: 0000000000000000 R14: ffff97c2fa9d9600 R15: 0000000000000000
            [17773.226598] FS: 00007f18a889b080(0000) GS:ffff97c37fd00000(0000) knlGS:0000000000000000
            [17773.228142] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [17773.229278] CR2: 00005591f490e668 CR3: 000000003497c003 CR4: 00000000001706e0
            [17773.230656] Call Trace:
            [17773.297324] <IRQ>
            [17774.001000] ? watchdog_timer_fn.cold.10+0x46/0x9e
            [17774.037843] ? watchdog+0x30/0x30
            [17774.038589] ? __hrtimer_run_queues+0x101/0x280
            [17774.039524] ? hrtimer_interrupt+0x100/0x220
            [17774.040386] ? smp_apic_timer_interrupt+0x6a/0x130
            [17774.041370] ? apic_timer_interrupt+0xf/0x20
            [17774.042216] </IRQ>
            [17774.042684] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.044248] ? cfs_hash_for_each_relax+0x17b/0x480 [obdclass]
            [17774.045444] ? cfs_hash_for_each_relax+0x172/0x480 [obdclass]
            [17774.046634] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.047719] ? cleanup_resource+0x350/0x350 [ptlrpc]
            [17774.048817] cfs_hash_for_each_nolock+0x124/0x200 [obdclass]
            [17774.049984] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
            [17774.262073] __ldlm_namespace_free+0x52/0x4e0 [ptlrpc]
            [17774.263285] ldlm_namespace_free_prior+0x5e/0x200 [ptlrpc]
            [17774.264623] mdt_device_fini+0x480/0xf80 [mdt]
            [17775.511739] obd_precleanup+0xf4/0x220 [obdclass]
            [17775.512833] ? class_disconnect_exports+0x187/0x2f0 [obdclass]
            [17775.514029] class_cleanup+0x322/0x900 [obdclass]
            [17775.515047] class_process_config+0x3bb/0x20a0 [obdclass]
            [17775.516163] ? class_manual_cleanup+0x191/0x780 [obdclass]
            [17775.517336] class_manual_cleanup+0x45b/0x780 [obdclass]
            [17775.518435] server_put_super+0xd62/0x11f0 [ptlrpc]
            [17775.577312] ? fsnotify_sb_delete+0x138/0x1c0
            [17775.578275] generic_shutdown_super+0x6c/0x110
            [17775.579220] kill_anon_super+0x14/0x30
            [17775.580050] deactivate_locked_super+0x34/0x70
            [17775.581003] cleanup_mnt+0x3b/0x70
            [17775.581767] task_work_run+0x8a/0xb0
            [17775.582579] exit_to_usermode_loop+0xef/0x100
            [17775.583529] do_syscall_64+0x195/0x1a0
            [17775.584330] entry_SYSCALL_64_after_hwframe+0x66/0xcb{noformat}
            maloo Maloo created issue -

            People

              arshad512 Arshad Hussain
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: