Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10123

Repeated timeouts during mount()- MDS and client.

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.11.0
    • Soak cluster, lustre-master build 3654 version=2.10.54_13_g84f690e
    • 3
    • 9223372036854775807

    Description

      Attempting to mount filesystem, with DNE
      MGT/MDS node hangs on initial mount.

      Oct 16 16:38:01 soak-8 kernel: LDISKFS-fs warning (device dm-0): ldiskfs_multi_mount_protect:322: MMP interval 42 higher than expected, please wait.
      Oct 16 16:38:44 soak-8 kernel: LDISKFS-fs (dm-0): recovery complete
      Oct 16 16:38:44 soak-8 kernel: LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,user_xattr,no_mbcache,nodelalloc
      Oct 16 16:38:45 soak-8 kernel: Lustre: MGS: Connection restored to e7ad9453-d165-e9e1-75ba-304460ab678f (at 0@lo)
      Oct 16 16:38:50 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1508171925/real 1508171925]  req@ffff88081c348c00 x1581432791105776/t0(0) o8->soaked-OST0001-osc-MDT0000@192.168.1.103@o2ib:28/4 lens 520/544 e 0 to 1 dl 1508171930 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Oct 16 16:39:10 soak-8 kernel: LustreError: 137-5: soaked-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      Oct 16 16:39:15 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1508171950/real 1508171950]  req@ffff880814820c00 x1581432791105856/t0(0) o8->soaked-OST0000-osc-MDT0000@192.168.1.103@o2ib:28/4 lens 520/544 e 0 to 1 dl 1508171955 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Oct 16 16:39:15 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
      Oct 16 16:39:24 soak-8 sssd_be: GSSAPI client step 1
      Oct 16 16:39:24 soak-8 sssd_be: GSSAPI client step 1
      Oct 16 16:39:24 soak-8 sssd_be: GSSAPI client step 1
      Oct 16 16:39:24 soak-8 sssd_be: GSSAPI client step 2
      Oct 16 16:39:45 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1508171975/real 1508171975]  req@ffff880814821800 x1581432791105936/t0(0) o38->soaked-MDT0003-osp-MDT0000@192.168.1.111@o2ib:24/4 lens 520/544 e 0 to 1 dl 1508171985 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Oct 16 16:39:45 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
      Oct 16 16:40:00 soak-8 kernel: LustreError: 137-5: soaked-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      Oct 16 16:40:01 soak-8 systemd: Started Session 7 of user root.
      Oct 16 16:40:01 soak-8 systemd: Starting Session 7 of user root.
      Oct 16 16:40:01 soak-8 CROND[13027]: (root) CMD (/usr/lib64/sa/sa1 1 1)
      Oct 16 16:40:10 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1508172000/real 1508172000]  req@ffff880814822d00 x1581432791106048/t0(0) o8->soaked-OST0000-osc-MDT0000@192.168.1.103@o2ib:28/4 lens 520/544 e 0 to 1 dl 1508172010 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Oct 16 16:40:10 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 4 previous similar messages
      Oct 16 16:40:40 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1508172025/real 1508172025]  req@ffff880814823c00 x1581432791106128/t0(0) o38->soaked-MDT0003-osp-MDT0000@192.168.1.111@o2ib:24/4 lens 520/544 e 0 to 1 dl 1508172040 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Oct 16 16:40:40 soak-8 kernel: Lustre: 12855:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
      Oct 16 16:40:50 soak-8 kernel: LustreError: 137-5: soaked-MDT0001_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      Oct 16 16:40:53 soak-8 kernel: INFO: task mount.lustre:12918 blocked for more than 120 seconds.
      Oct 16 16:40:54 soak-8 kernel: mount.lustre    D 0000000000000001     0 12918  12917 0x00000082
      Oct 16 16:40:54 soak-8 kernel: ffff8804b9bbb7b0 0000000000000082 ffff88083f9e0000 ffff8804b9bbbfd8
      Oct 16 16:40:54 soak-8 kernel: ffff8804b9bbbfd8 ffff8804b9bbbfd8 ffff88083f9e0000 ffff8808203a8150
      Oct 16 16:40:54 soak-8 kernel: 7fffffffffffffff ffff8808203a8148 ffff88083f9e0000 0000000000000001
      Oct 16 16:40:54 soak-8 kernel: Call Trace:
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff816a94c9>] schedule+0x29/0x70
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff810c9ef9>] ? select_task_rq_fair+0x549/0x700
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a2f6b4>] llog_process_or_fork+0x244/0x450 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a2f8d4>] llog_process+0x14/0x20 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a5fe45>] class_config_parse_llog+0x125/0x350 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc10235a8>] mgc_process_cfg_log+0x788/0xc40 [mgc]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc1026413>] mgc_process_log+0x3d3/0x890 [mgc]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a68550>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc1026b18>] ? do_config_log_add+0x248/0x580 [mgc]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc10279f0>] mgc_process_config+0x890/0x13f0 [mgc]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a6bfee>] lustre_process_log+0x2de/0xaf0 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc090dba7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a57339>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a98c32>] server_start_targets+0x1352/0x2a70 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a6ef5d>] ? lustre_start_mgc+0x20d/0x2800 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a68550>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a9b3dd>] server_fill_super+0x108d/0x184c [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a71ff8>] lustre_fill_super+0x328/0x950 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a71cd0>] ? lustre_common_put_super+0x270/0x270 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff81204b0d>] mount_nodev+0x4d/0xb0
      Oct 16 16:40:54 soak-8 kernel: [<ffffffffc0a69e18>] lustre_mount+0x38/0x60 [obdclass]
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff81205599>] mount_fs+0x39/0x1b0
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff81222077>] vfs_kern_mount+0x67/0x110
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff81224583>] do_mount+0x233/0xaf0
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff8118760e>] ? __get_free_pages+0xe/0x40
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff812251c6>] SyS_mount+0x96/0xf0
      Oct 16 16:40:54 soak-8 kernel: [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
      

      Power-cycled soak-8, mount completed after power cycle. Continued bringing up the filesystem
      soak-10 (MDT0003) hung during mount.

      Oct 16 16:57:37 soak-10 kernel: INFO: task mount.lustre:13350 blocked for more than 120 seconds.
      Oct 16 16:57:37 soak-10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Oct 16 16:57:38 soak-10 kernel: mount.lustre    D 0000000000000001     0 13350  13349 0x00000082
      Oct 16 16:57:38 soak-10 kernel: ffff880817bdf7b0 0000000000000086 ffff880817783f40 ffff880817bdffd8
      Oct 16 16:57:38 soak-10 kernel: ffff880817bdffd8 ffff880817bdffd8 ffff880817783f40 ffff88082ce87c30
      Oct 16 16:57:38 soak-10 kernel: 7fffffffffffffff ffff88082ce87c28 ffff880817783f40 0000000000000001
      Oct 16 16:57:38 soak-10 kernel: Call Trace:
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff816a94c9>] schedule+0x29/0x70
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff810c76f5>] ? sched_clock_cpu+0x85/0xc0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff810c12c8>] ? check_preempt_curr+0x78/0xa0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff810c1309>] ? ttwu_do_wakeup+0x19/0xd0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a406b4>] llog_process_or_fork+0x244/0x450 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a408d4>] llog_process+0x14/0x20 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a70e45>] class_config_parse_llog+0x125/0x350 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc14505a8>] mgc_process_cfg_log+0x788/0xc40 [mgc]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc1453413>] mgc_process_log+0x3d3/0x890 [mgc]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc1453b77>] ? do_config_log_add+0x2a7/0x580 [mgc]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc14549f0>] mgc_process_config+0x890/0x13f0 [mgc]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a7cfee>] lustre_process_log+0x2de/0xaf0 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc091b4e4>] ? libcfs_log_return+0x24/0x30 [libcfs]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0aa9c32>] server_start_targets+0x1352/0x2a70 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a80259>] ? lustre_start_mgc+0x509/0x2800 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a79550>] ? class_config_dump_handler+0x7e0/0x7e0 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0aac3dd>] server_fill_super+0x108d/0x184c [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a82ff8>] lustre_fill_super+0x328/0x950 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a82cd0>] ? lustre_common_put_super+0x270/0x270 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff81204b0d>] mount_nodev+0x4d/0xb0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffffc0a7ae18>] lustre_mount+0x38/0x60 [obdclass]
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff81205599>] mount_fs+0x39/0x1b0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff81222077>] vfs_kern_mount+0x67/0x110
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff81224583>] do_mount+0x233/0xaf0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff8118760e>] ? __get_free_pages+0xe/0x40
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff812251c6>] SyS_mount+0x96/0xf0
      Oct 16 16:57:38 soak-10 kernel: [<ffffffff816b4fc9>] system_call_fastpath+0x16/0x1b
      

      Power-cycled that node, continued with mount, second mount on soak-10 worked.
      Mounting clients, soak-40 hung.

      Oct 16 17:17:56 soak-40 kernel: INFO: task mount.lustre:2933 blocked for more than 120 seconds.
      Oct 16 17:17:56 soak-40 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      Oct 16 17:17:56 soak-40 kernel: mount.lustre    D 0000000000000001     0  2933   2932 0x00000080
      Oct 16 17:17:56 soak-40 kernel: ffff88107f827940 0000000000000086 ffff88085c2e5ee0 ffff88107f827fd8
      Oct 16 17:17:56 soak-40 kernel: ffff88107f827fd8 ffff88107f827fd8 ffff88085c2e5ee0 ffff88085cb5c5d0
      Oct 16 17:17:57 soak-40 kernel: 7fffffffffffffff ffff88085cb5c5c8 ffff88085c2e5ee0 0000000000000001
      Oct 16 17:17:57 soak-40 kernel: Call Trace:
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff816a94e9>] schedule+0x29/0x70
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff810c76f5>] ? sched_clock_cpu+0x85/0xc0
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff810c12c8>] ? check_preempt_curr+0x78/0xa0
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff810c1309>] ? ttwu_do_wakeup+0x19/0xd0
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff816a989d>] wait_for_completion+0xfd/0x140
      Oct 16 17:17:57 soak-40 kernel: [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc0ba66b4>] llog_process_or_fork+0x244/0x450 [obdclass]
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc0ba68d4>] llog_process+0x14/0x20 [obdclass]
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc0bd6d65>] class_config_parse_llog+0x125/0x350 [obdclass]
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc0844368>] mgc_process_cfg_log+0x788/0xc40 [mgc]
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc08471e3>] mgc_process_log+0x3d3/0x890 [mgc]
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc0847947>] ? do_config_log_add+0x2a7/0x580 [mgc]
      Oct 16 17:17:57 soak-40 kernel: [<ffffffffc08487c0>] mgc_process_config+0x890/0x13f0 [mgc]
      Oct 16 17:17:58 soak-40 kernel: [<ffffffffc0be2d7e>] lustre_process_log+0x2de/0xaf0 [obdclass]
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff816aba0e>] ? _raw_spin_unlock_bh+0x1e/0x20
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff8132318b>] ? fprop_local_init_percpu+0x1b/0x30
      Oct 16 17:17:58 soak-40 kernel: [<ffffffffc0d7b608>] ll_fill_super+0xaf8/0x1220 [lustre]
      Oct 16 17:17:58 soak-40 kernel: [<ffffffffc0be8ce6>] lustre_fill_super+0x286/0x910 [obdclass]
      Oct 16 17:17:58 soak-40 kernel: [<ffffffffc0be8a60>] ? lustre_common_put_super+0x270/0x270 [obdclass]
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff81204b0d>] mount_nodev+0x4d/0xb0
      Oct 16 17:17:58 soak-40 kernel: [<ffffffffc0be0ba8>] lustre_mount+0x38/0x60 [obdclass]
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff81205599>] mount_fs+0x39/0x1b0
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff81222067>] vfs_kern_mount+0x67/0x110
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff81224573>] do_mount+0x233/0xaf0
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff8118760e>] ? __get_free_pages+0xe/0x40
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff812251b6>] SyS_mount+0x96/0xf0
      Oct 16 17:17:58 soak-40 kernel: [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
      Oct 16 17:19:59 soak-40 kernel: INFO: task mount.lustre:2933 blocked for more than 120 seconds.
      Oct 16 17:19:59 soak-40 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      

      Dumped -1 lustre-log from client, attached.

      Attachments

        Issue Links

          Activity

            [LU-10123] Repeated timeouts during mount()- MDS and client.
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30147/
            Subject: LU-10123 lnet: ensure peer put back on dc request queue
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d0185dd433943506f434eb5cf9707c2b6807b528

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30147/ Subject: LU-10123 lnet: ensure peer put back on dc request queue Project: fs/lustre-release Branch: master Current Patch Set: Commit: d0185dd433943506f434eb5cf9707c2b6807b528
            pjones Peter Jones added a comment -

            It had not been but I just added it to the chaser list. If there's something you think is important to get landed for soak please flag it either directly or via James so we can expedite it.

            pjones Peter Jones added a comment - It had not been but I just added it to the chaser list. If there's something you think is important to get landed for soak please flag it either directly or via James so we can expedite it.

            We still see client mount hangs on latest master, is this patch in queue for landing?

            cliffw Cliff White (Inactive) added a comment - We still see client mount hangs on latest master, is this patch in queue for landing?

            Cliff, I have just rebased my change #30147 on latest master.

            bfaccini Bruno Faccini (Inactive) added a comment - Cliff, I have just rebased my change #30147 on latest master.

            We are attempting to test master now, can you rebase your patch on current master?

            cliffw Cliff White (Inactive) added a comment - We are attempting to test master now, can you rebase your patch on current master?

            We are doing b2-10 now, will consider this when we move back to master

            cliffw Cliff White (Inactive) added a comment - We are doing b2-10 now, will consider this when we move back to master

            Sarah, Cliff,
            Can you give a try to more recent version/patch-set #7 of my patch change #30147 ?

            bfaccini Bruno Faccini (Inactive) added a comment - Sarah, Cliff, Can you give a try to more recent version/patch-set #7 of my patch change #30147 ?
            sarah Sarah Liu added a comment -

            Hi Bruno,

            Yesterday I tried to make a patch to include both your patch and the one for LU-10134 for soak. I checked out your patch and rebase master, then checkout LU-10134 patch and rebase on your patch. The new patch set #2 (which hit the LBUG) should have no change but only rebase master.

            The test patch which includes both patch also hit the LBUG
            https://review.whamcloud.com/#/c/30187/

            sarah Sarah Liu added a comment - Hi Bruno, Yesterday I tried to make a patch to include both your patch and the one for LU-10134 for soak. I checked out your patch and rebase master, then checkout LU-10134 patch and rebase on your patch. The new patch set #2 (which hit the LBUG) should have no change but only rebase master. The test patch which includes both patch also hit the LBUG https://review.whamcloud.com/#/c/30187/

            Hmm, seems that my patch has also introduced some regression and causes this new LBUG on Client side for different auto-tests sub-tests/steps (like lnet-selftest. lustre-initialization, upon mount, ...) :

            [ 5914.956548] LNetError: 23347:0:(peer.c:272:lnet_destroy_peer_locked()) ASSERTION( list_empty(&lp->lp_peer_nets) ) failed: 
            [ 5914.960593] LNetError: 23347:0:(peer.c:272:lnet_destroy_peer_locked()) LBUG
            [ 5914.964063] Pid: 23347, comm: lnet_discovery
            [ 5914.967234] 
            [ 5914.967234] Call Trace:
            [ 5914.973027]  [<ffffffffc07337ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
            [ 5914.976440]  [<ffffffffc073383c>] lbug_with_loc+0x4c/0xb0 [libcfs]
            [ 5914.979816]  [<ffffffffc06bab42>] lnet_destroy_peer_locked+0x232/0x340 [lnet]
            [ 5914.983278]  [<ffffffffc06bb175>] lnet_peer_discovery_complete+0x295/0x340 [lnet]
            [ 5914.986724]  [<ffffffffc06c0c30>] lnet_peer_discovery+0x6f0/0x11b0 [lnet]
            [ 5914.990141]  [<ffffffff816a904d>] ? __schedule+0x39d/0x8b0
            [ 5914.993348]  [<ffffffff810b1920>] ? autoremove_wake_function+0x0/0x40
            [ 5914.996599]  [<ffffffffc06c0540>] ? lnet_peer_discovery+0x0/0x11b0 [lnet]
            [ 5914.999836]  [<ffffffff810b099f>] kthread+0xcf/0xe0
            [ 5915.002827]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0
            [ 5915.005771]  [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90
            [ 5915.008696]  [<ffffffff810b08d0>] ? kthread+0x0/0xe0
            

            Cliff, do you also trigger this ?

            bfaccini Bruno Faccini (Inactive) added a comment - Hmm, seems that my patch has also introduced some regression and causes this new LBUG on Client side for different auto-tests sub-tests/steps (like lnet-selftest. lustre-initialization, upon mount, ...) : [ 5914.956548] LNetError: 23347:0:(peer.c:272:lnet_destroy_peer_locked()) ASSERTION( list_empty(&lp->lp_peer_nets) ) failed: [ 5914.960593] LNetError: 23347:0:(peer.c:272:lnet_destroy_peer_locked()) LBUG [ 5914.964063] Pid: 23347, comm: lnet_discovery [ 5914.967234] [ 5914.967234] Call Trace: [ 5914.973027] [<ffffffffc07337ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [ 5914.976440] [<ffffffffc073383c>] lbug_with_loc+0x4c/0xb0 [libcfs] [ 5914.979816] [<ffffffffc06bab42>] lnet_destroy_peer_locked+0x232/0x340 [lnet] [ 5914.983278] [<ffffffffc06bb175>] lnet_peer_discovery_complete+0x295/0x340 [lnet] [ 5914.986724] [<ffffffffc06c0c30>] lnet_peer_discovery+0x6f0/0x11b0 [lnet] [ 5914.990141] [<ffffffff816a904d>] ? __schedule+0x39d/0x8b0 [ 5914.993348] [<ffffffff810b1920>] ? autoremove_wake_function+0x0/0x40 [ 5914.996599] [<ffffffffc06c0540>] ? lnet_peer_discovery+0x0/0x11b0 [lnet] [ 5914.999836] [<ffffffff810b099f>] kthread+0xcf/0xe0 [ 5915.002827] [<ffffffff810b08d0>] ? kthread+0x0/0xe0 [ 5915.005771] [<ffffffff816b4fd8>] ret_from_fork+0x58/0x90 [ 5915.008696] [<ffffffff810b08d0>] ? kthread+0x0/0xe0 Cliff, do you also trigger this ?

            People

              bfaccini Bruno Faccini (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: