Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9582

gssnull instability

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      Trying to run sanity-gss, I found gssnull flavor very unstable on server side.
      Once lsvcgssd (-z) daemon is started on server side and flavor is set to gssnull (lctl conf_param <fsname>.srpc.flavor.default=gssnull), connections between nodes get authenticated. But then, stack traces similar to the following get dumped on server side:

       [ 535.556541] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() [ 535.556885] list_del corruption. prev->next should be ffff8803fa1a3bd0, but was ffff880405b71f58
       [ 535.557043] Modules linked in: ptlrpc_gss(OF) sunrpc osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) loop mbcache jbd2 sha512_generic ppdev pcspkr parport_pc parport i2c_piix4 i2c_core serio_raw virtio_balloon xfs libcrc32c sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi virtio_scsi 8139too ata_piix 8139cp libata mii virtio_pci virtio_ring virtio floppy [last unloaded: libcfs] [ 535.557043] CPU: 5 PID: 3378 Comm: mdt00_003 Tainted: GF O-------------- 3.10.0-229.20.1.el7.x86_64 #1
       [ 535.557043] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      [ 535.557043] ffff8803fa1a3ac8 00000000ace999ec ffff8803fa1a3a80 ffffffff816045b6
       [ 535.557043] ffff8803fa1a3ab8 ffffffff8106e29b ffff8803fa1a3bd0 ffff8803fa1a3bb8
      [ 535.557043] 0000000000000246 0000000000000000 ffff8803eecf25a0 ffff8803fa1a3b20
       [ 535.557043] Call Trace:
       [ 535.557043] [<ffffffff816045b6>] dump_stack+0x19/0x1b
       [ 535.557043] [<ffffffff8106e29b>] warn_slowpath_common+0x6b/0xb0
       [ 535.557043] [<ffffffff8106e33c>] warn_slowpath_fmt+0x5c/0x80
       [ 535.557043] [<ffffffff8107eda0>] ? __internal_add_timer+0x130/0x130
       [ 535.557043] [<ffffffff812ed9f1>] __list_del_entry+0xa1/0xd0
       [ 535.557043] [<ffffffff812eda2d>] list_del+0xd/0x30
       [ 535.557043] [<ffffffff81098086>] remove_wait_queue+0x26/0x40
       [ 535.557043] [<ffffffffa0bde99f>] gss_svc_upcall_handle_init+0x25f/0xee0 [ptlrpc_gss]
       [ 535.557043] [<ffffffff810a9510>] ? wake_up_state+0x20/0x20
       [ 535.557043] [<ffffffffa0bd0c49>] gss_svc_handle_init+0x7e9/0xb60 [ptlrpc_gss]
       [ 535.557043] [<ffffffffa0bd70db>] gss_svc_accept+0x81b/0xad0 [ptlrpc_gss]
       [ 535.557043] [<ffffffffa0bebf18>] gss_svc_accept_kr+0x18/0x20 [ptlrpc_gss]
       [ 535.557043] [<ffffffffa062f70e>] sptlrpc_svc_unwrap_request+0xee/0x600 [ptlrpc]
       [ 535.557043] [<ffffffffa060f594>] ptlrpc_main+0x964/0x1de0 [ptlrpc]
       [ 535.557043] [<ffffffffa060ec30>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
       [ 535.557043] [<ffffffff8109727f>] kthread+0xcf/0xe0
       [ 535.557043] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
       [ 535.557043] [<ffffffff81614358>] ret_from_fork+0x58/0x90
       [ 535.557043] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
      

      followed by this message:

      [ 535.571130] Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound).}}
      

      This pattern is repeated several times, until a GPF occurs:

       [ 996.052879] general protection fault: 0000 [#1] SMP
       [ 996.053003] Modules linked in: ptlrpc_gss(OF) sunrpc osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) libcfs(OF) ldiskfs(OF) loop mbcache jbd2 sha512_generic ppdev pcspkr parport_pc parport i2c_piix4 i2c_core serio_raw virtio_balloon xfs libcrc32c sd_mod crc_t10dif crct10dif_common ata_generic pata_acpi virtio_scsi 8139too ata_piix 8139cp libata mii virtio_pci virtio_ring virtio floppy [last unloaded: libcfs]
       [ 996.053003] CPU: 5 PID: 2951 Comm: mdt_out00_001 Tainted: GF W O-------------- 3.10.0-229.20.1.el7.x86_64 #1
       [ 996.053003] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
       [ 996.053003] task: ffff8800da83a220 ti: ffff8803fc070000 task.ti: ffff8803fc070000
       [ 996.053003] RIP: 0010:[<ffffffff812e29e6>] [<ffffffff812e29e6>] memcpy+0x16/0x110
       [ 996.053003] RSP: 0018:ffff8803fc073998 EFLAGS: 00010202
       [ 996.053003] RAX: ffffc9000e8c3000 RBX: ffff8803fc0739f8 RCX: ffff880406762300
       [ 996.053003] RDX: 000000005a5a5a1a RSI: 5a5a5a5a5a5a5a5a RDI: ffffc9000e8c3000
       [ 996.053003] RBP: ffff8803fc0739b8 R08: 0000000000000000 R09: ffffea000e9cfc80
       [ 996.053003] R10: 0000000000004120 R11: fffffffffffffff8 R12: ffff8803ee020808
       [ 996.053003] R13: ffff8803fc05d050 R14: 0000000000000000 R15: ffff8803f0b18e40
       [ 996.053003] FS: 0000000000000000(0000) GS:ffff88041fd40000(0000) knlGS:0000000000000000
       [ 996.053003] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       [ 996.053003] CR2: 00007f6eb788aba0 CR3: 0000000036483000 CR4: 00000000000006e0
       [ 996.053003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [ 996.053003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       [ 996.053003] Stack:
       [ 996.053003] ffffffffa0be03dd ffff8803fc0739c8 ffff8803ee020780 ffff8803fc05d050
       [ 996.053003] ffff8803fc073b70 ffffffffa0bdcd90 0000000000000000 0000000000000000
       [ 996.053003] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
       [ 996.053003] Call Trace:
       [ 996.053003] [<ffffffffa0be03dd>] ? rawobj_dup+0x15d/0x2e0 [ptlrpc_gss]
       [ 996.053003] [<ffffffffa0bdcd90>] gss_svc_searchbyctx+0x40/0xa0 [ptlrpc_gss]
       [ 996.053003] [<ffffffffa0bdc870>] ? rsc_alloc+0xc0/0xc0 [ptlrpc_gss]
       [ 996.053003] [<ffffffffa0bdecc5>] gss_svc_upcall_handle_init+0x585/0xee0 [ptlrpc_gss]
       [ 996.053003] [<ffffffff810a9510>] ? wake_up_state+0x20/0x20
       [ 996.053003] [<ffffffffa0bd0c49>] gss_svc_handle_init+0x7e9/0xb60 [ptlrpc_gss]
       [ 996.053003] [<ffffffffa0bd70db>] gss_svc_accept+0x81b/0xad0 [ptlrpc_gss]
       [ 996.053003] [<ffffffffa0bebf18>] gss_svc_accept_kr+0x18/0x20 [ptlrpc_gss]
       [ 996.053003] [<ffffffffa062f70e>] sptlrpc_svc_unwrap_request+0xee/0x600 [ptlrpc]
       [ 996.053003] [<ffffffffa060f594>] ptlrpc_main+0x964/0x1de0 [ptlrpc]
       [ 996.053003] [<ffffffffa060ec30>] ? ptlrpc_register_service+0xe40/0xe40 [ptlrpc]
       [ 996.053003] [<ffffffff8109727f>] kthread+0xcf/0xe0
       [ 996.053003] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
       [ 996.053003] [<ffffffff81614358>] ret_from_fork+0x58/0x90
       [ 996.053003] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
       [ 996.053003] Code: 00 00 00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89
       [ 996.053003] RIP [<ffffffff812e29e6>] memcpy+0x16/0x110
       [ 996.053003] RSP <ffff8803fc073998>
      

      Attachments

        Issue Links

          Activity

            People

              sebastien Sebastien Buisson
              sbuisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: