Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10804

obdfilter-survey test_3a: lprocfs_alloc_obd_stats()) ASSERTION( obd->obd_proc_entry != ((void *)0) ) failed; LBUG

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • Lustre 2.11.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for elena <c17455@cray.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/5d126b54-2233-11e8-9ec4-52540065bddc

      [ 3072.347530] LustreError: 30713:0:(lprocfs_status_server.c:647:lprocfs_alloc_obd_stats()) ASSERTION( obd->obd_proc_entry != ((void *)0) ) failed: 
      [ 3072.351782] LustreError: 30713:0:(lprocfs_status_server.c:647:lprocfs_alloc_obd_stats()) LBUG
      [ 3072.354022] Pid: 30713, comm: lctl
      [ 3072.355821] 
      [ 3072.355821] Call Trace:
      [ 3072.359137]  [<ffffffffc06917ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [ 3072.361177]  [<ffffffffc069183c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [ 3072.363336]  [<ffffffffc0cbe06f>] lprocfs_alloc_obd_stats+0x18f/0x200 [obdclass]
      [ 3072.365454]  [<ffffffffc0c81730>] ? lprocfs_obd_setup+0x130/0x320 [obdclass]
      [ 3072.367602]  [<ffffffffc12b13ad>] echo_srv_device_alloc+0x48d/0x540 [obdecho]
      [ 3072.369738]  [<ffffffffc0c91cb4>] obd_setup+0x114/0x2a0 [obdclass]
      [ 3072.371892]  [<ffffffffc0c920e8>] class_setup+0x2a8/0x840 [obdclass]
      [ 3072.373960]  [<ffffffffc0c9651f>] class_process_config+0x1b3f/0x27e0 [obdclass]
      [ 3072.376145]  [<ffffffffc0c86ff9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [ 3072.378240]  [<ffffffffc0c7053a>] class_handle_ioctl+0x56a/0x1df0 [obdclass]
      [ 3072.380342]  [<ffffffff8121f7b8>] ? destroy_inode+0x38/0x60
      [ 3072.382250]  [<ffffffff812b3ea8>] ? security_capable+0x18/0x20
      [ 3072.384178]  [<ffffffffc0c567f2>] obd_class_ioctl+0xd2/0x170 [obdclass]
      [ 3072.386089]  [<ffffffff8121730d>] do_vfs_ioctl+0x33d/0x540
      [ 3072.387929]  [<ffffffff81224a04>] ? mntput+0x24/0x40
      [ 3072.389676]  [<ffffffff81205013>] ? __fput+0x183/0x260
      [ 3072.391419]  [<ffffffff812175b1>] SyS_ioctl+0xa1/0xc0
      [ 3072.393128]  [<ffffffff816b8930>] ? system_call_after_swapgs+0x15d/0x214
      [ 3072.394862]  [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
      [ 3072.396554]  [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214
      [ 3072.398245] 
      [ 3072.399550] Kernel panic - not syncing: LBUG
      

      Attachments

        Issue Links

          Activity

            [LU-10804] obdfilter-survey test_3a: lprocfs_alloc_obd_stats()) ASSERTION( obd->obd_proc_entry != ((void *)0) ) failed; LBUG

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/31664
            Subject: LU-10804 echo: allow echo server to setup procfs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8685e8db0b55651fab21170ed75bc9df454c0e58

            gerrit Gerrit Updater added a comment - Mike Pershin (mike.pershin@intel.com) uploaded a new patch: https://review.whamcloud.com/31664 Subject: LU-10804 echo: allow echo server to setup procfs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8685e8db0b55651fab21170ed75bc9df454c0e58

            This is LU-8066, commit 0100ab268c and this particular change:

            diff --git a/lustre/obdecho/echo_client.c b/lustre/obdecho/echo_client.c
            index 081a339..6e25a02 100644
            --- a/lustre/obdecho/echo_client.c
            +++ b/lustre/obdecho/echo_client.c
            @@ -3071,7 +3071,7 @@ static int __init obdecho_init(void)
                     if (rc != 0)
                             goto failed_0;
             
            -       rc = class_register_type(&echo_obd_ops, NULL, true, NULL,
            +       rc = class_register_type(&echo_obd_ops, NULL, false, NULL,
                                             LUSTRE_ECHO_NAME, NULL);
                    if (rc != 0)
                            goto failed_1;
            

            that change causes empty lprocfs root for echo server device and later it asserts on it in lprocfs_alloc_obd_stats() as I can see.

            tappro Mikhail Pershin added a comment - This is LU-8066 , commit 0100ab268c and this particular change: diff --git a/lustre/obdecho/echo_client.c b/lustre/obdecho/echo_client.c index 081a339..6e25a02 100644 --- a/lustre/obdecho/echo_client.c +++ b/lustre/obdecho/echo_client.c @@ -3071,7 +3071,7 @@ static int __init obdecho_init(void) if (rc != 0) goto failed_0; - rc = class_register_type(&echo_obd_ops, NULL, true , NULL, + rc = class_register_type(&echo_obd_ops, NULL, false , NULL, LUSTRE_ECHO_NAME, NULL); if (rc != 0) goto failed_1; that change causes empty lprocfs root for echo server device and later it asserts on it in lprocfs_alloc_obd_stats() as I can see.
            sarah Sarah Liu added a comment -

            Hit this in interop testing

            https://testing.hpdd.intel.com/test_sets/baa94546-2709-11e8-b3c6-52540065bddc

             

            client: 2.10.3

            server: lustre-master tag-2.10.59

            OSS console

            [75396.029755] Lustre: DEBUG MARKER: == obdfilter-survey test 3a: Network survey ========================================================== 18:48:29 (1520966909)
            [75401.745879] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.4.73@tcp) was lost; in progress operations using this service will wait for recovery to complete
            [75401.749951] Lustre: Skipped 9 previous similar messages
            [75410.250117] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts
            [75410.567342] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost1
            [75410.784095] Lustre: server umount lustre-OST0000 complete
            [75410.950964] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75411.276952] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost2' ' /proc/mounts
            [75411.590521] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost2
            [75411.965347] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75412.289128] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost3' ' /proc/mounts
            [75412.601354] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost3
            [75412.988541] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75413.310862] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost4' ' /proc/mounts
            [75413.620037] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost4
            [75413.745052] Lustre: 26996:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1520966920/real 1520966920] req@ffff880037402400 x1594796080200464/t0(0) o400->MGC10.9.4.73@tcp@10.9.4.73@tcp:26/25 lens 224/224 e 0 to 1 dl 1520966927 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            [75413.751282] Lustre: 26996:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 1 previous similar message
            [75413.753466] LustreError: 166-1: MGC10.9.4.73@tcp: Connection to MGS (at 10.9.4.73@tcp) was lost; in progress operations using this service will fail
            [75413.977797] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75414.299315] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost5' ' /proc/mounts
            [75414.606635] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost5
            [75414.967344] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75415.289338] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost6' ' /proc/mounts
            [75415.598284] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost6
            [75415.793961] Lustre: server umount lustre-OST0005 complete
            [75415.795939] Lustre: Skipped 4 previous similar messages
            [75415.954904] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75416.274434] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost7' ' /proc/mounts
            [75416.583237] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost7
            [75416.965393] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            [75419.722863] Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d @
            [75421.172060] LustreError: 14767:0:(lprocfs_status_server.c:647:lprocfs_alloc_obd_stats()) ASSERTION( obd->obd_proc_entry != ((void *)0) ) failed: 
            [75421.176024] LustreError: 14767:0:(lprocfs_status_server.c:647:lprocfs_alloc_obd_stats()) LBUG
            [75421.178166] Pid: 14767, comm: lctl
            [75421.179941] 
            [75421.179941] Call Trace:
            [75421.207941] [<ffffffffc06ae7ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
            [75421.210086] [<ffffffffc06ae83c>] lbug_with_loc+0x4c/0xb0 [libcfs]
            [75421.212244] [<ffffffffc08c867f>] lprocfs_alloc_obd_stats+0x18f/0x200 [obdclass]
            [75421.214387] [<ffffffffc088b920>] ? lprocfs_obd_setup+0x130/0x320 [obdclass]
            [75421.237295] [<ffffffffc10a8ced>] echo_setup+0x1bd/0x2c0 [obdecho]
            [75421.273090] [<ffffffffc0ab6810>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc]
            [75421.275072] [<ffffffffc089bfda>] obd_setup+0x18a/0x2b0 [obdclass]
            [75421.276950] [<ffffffffc06c0a66>] ? cfs_hash_create+0x366/0xa20 [libcfs]
            [75421.278899] [<ffffffffc089c3a8>] class_setup+0x2a8/0x840 [obdclass]
            [75421.280725] [<ffffffffc08a081c>] class_process_config+0x1b5c/0x2810 [obdclass]
            [75421.282654] [<ffffffffc0891219>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
            [75421.284527] [<ffffffffc087a71a>] class_handle_ioctl+0x56a/0x1de0 [obdclass]
            [75421.286438] [<ffffffff8122239b>] ? destroy_inode+0x3b/0x60
            [75421.288202] [<ffffffff812b72be>] ? security_capable+0x1e/0x20
            [75421.290020] [<ffffffffc08607f2>] obd_class_ioctl+0xd2/0x170 [obdclass]
            [75421.291824] [<ffffffff81219e90>] do_vfs_ioctl+0x350/0x560
            [75421.293554] [<ffffffff812275e4>] ? mntput+0x24/0x40
            [75421.295289] [<ffffffff812079b6>] ? __fput+0x186/0x260
            [75421.296973] [<ffffffff8121a141>] SyS_ioctl+0xa1/0xc0
            [75421.298680] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146
            [75421.300454] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
            [75421.302229] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146
            [75421.304033] 
            [75421.305435] Kernel panic - not syncing: LBUG
            [75421.306428] CPU: 1 PID: 14767 Comm: lctl Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1
            [75421.306428] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            [75421.306428] Call Trace:
            [75421.306428] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
            [75421.306428] [<ffffffff816a8634>] panic+0xe8/0x21f
            [75421.306428] [<ffffffffc06ae854>] lbug_with_loc+0x64/0xb0 [libcfs]
            [75421.306428] [<ffffffffc08c867f>] lprocfs_alloc_obd_stats+0x18f/0x200 [obdclass]
            [75421.306428] [<ffffffffc088b920>] ? lprocfs_obd_setup+0x130/0x320 [obdclass]
            [75421.306428] [<ffffffffc10a8ced>] echo_setup+0x1bd/0x2c0 [obdecho]
            [75421.306428] [<ffffffffc0ab6810>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc]
            [75421.306428] [<ffffffffc089bfda>] obd_setup+0x18a/0x2b0 [obdclass]
            [75421.306428] [<ffffffffc06c0a66>] ? cfs_hash_create+0x366/0xa20 [libcfs]
            [75421.306428] [<ffffffffc089c3a8>] class_setup+0x2a8/0x840 [obdclass]
            [75421.306428] [<ffffffffc08a081c>] class_process_config+0x1b5c/0x2810 [obdclass]
            [75421.306428] [<ffffffffc0891219>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
            [75421.306428] [<ffffffffc087a71a>] class_handle_ioctl+0x56a/0x1de0 [obdclass]
            [75421.306428] [<ffffffff8122239b>] ? destroy_inode+0x3b/0x60
            [75421.306428] [<ffffffff812b72be>] ? security_capable+0x1e/0x20
            [75421.306428] [<ffffffffc08607f2>] obd_class_ioctl+0xd2/0x170 [obdclass]
            [75421.306428] [<ffffffff81219e90>] do_vfs_ioctl+0x350/0x560
            [75421.306428] [<ffffffff812275e4>] ? mntput+0x24/0x40
            [75421.306428] [<ffffffff812079b6>] ? __fput+0x186/0x260
            [75421.306428] [<ffffffff8121a141>] SyS_ioctl+0xa1/0xc0
            [75421.306428] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146
            [75421.306428] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
            [75421.306428] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146
            [ 0.000000] Initializing cgroup subsys cpuset
            
            
            sarah Sarah Liu added a comment - Hit this in interop testing https://testing.hpdd.intel.com/test_sets/baa94546-2709-11e8-b3c6-52540065bddc   client: 2.10.3 server: lustre-master tag-2.10.59 OSS console [75396.029755] Lustre: DEBUG MARKER: == obdfilter-survey test 3a: Network survey ========================================================== 18:48:29 (1520966909) [75401.745879] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 10.9.4.73@tcp) was lost; in progress operations using this service will wait for recovery to complete [75401.749951] Lustre: Skipped 9 previous similar messages [75410.250117] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost1' ' /proc/mounts [75410.567342] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost1 [75410.784095] Lustre: server umount lustre-OST0000 complete [75410.950964] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75411.276952] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost2' ' /proc/mounts [75411.590521] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost2 [75411.965347] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75412.289128] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost3' ' /proc/mounts [75412.601354] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost3 [75412.988541] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75413.310862] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost4' ' /proc/mounts [75413.620037] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost4 [75413.745052] Lustre: 26996:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1520966920/real 1520966920] req@ffff880037402400 x1594796080200464/t0(0) o400->MGC10.9.4.73@tcp@10.9.4.73@tcp:26/25 lens 224/224 e 0 to 1 dl 1520966927 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [75413.751282] Lustre: 26996:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 1 previous similar message [75413.753466] LustreError: 166-1: MGC10.9.4.73@tcp: Connection to MGS (at 10.9.4.73@tcp) was lost; in progress operations using this service will fail [75413.977797] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75414.299315] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost5' ' /proc/mounts [75414.606635] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost5 [75414.967344] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75415.289338] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost6' ' /proc/mounts [75415.598284] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost6 [75415.793961] Lustre: server umount lustre-OST0005 complete [75415.795939] Lustre: Skipped 4 previous similar messages [75415.954904] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75416.274434] Lustre: DEBUG MARKER: grep -c /mnt/lustre-ost7' ' /proc/mounts [75416.583237] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost7 [75416.965393] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' [75419.722863] Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep tcp | cut -f 1 -d @ [75421.172060] LustreError: 14767:0:(lprocfs_status_server.c:647:lprocfs_alloc_obd_stats()) ASSERTION( obd->obd_proc_entry != ((void *)0) ) failed: [75421.176024] LustreError: 14767:0:(lprocfs_status_server.c:647:lprocfs_alloc_obd_stats()) LBUG [75421.178166] Pid: 14767, comm: lctl [75421.179941] [75421.179941] Call Trace: [75421.207941] [<ffffffffc06ae7ae>] libcfs_call_trace+0x4e/0x60 [libcfs] [75421.210086] [<ffffffffc06ae83c>] lbug_with_loc+0x4c/0xb0 [libcfs] [75421.212244] [<ffffffffc08c867f>] lprocfs_alloc_obd_stats+0x18f/0x200 [obdclass] [75421.214387] [<ffffffffc088b920>] ? lprocfs_obd_setup+0x130/0x320 [obdclass] [75421.237295] [<ffffffffc10a8ced>] echo_setup+0x1bd/0x2c0 [obdecho] [75421.273090] [<ffffffffc0ab6810>] ? ldlm_completion_ast+0x0/0x920 [ptlrpc] [75421.275072] [<ffffffffc089bfda>] obd_setup+0x18a/0x2b0 [obdclass] [75421.276950] [<ffffffffc06c0a66>] ? cfs_hash_create+0x366/0xa20 [libcfs] [75421.278899] [<ffffffffc089c3a8>] class_setup+0x2a8/0x840 [obdclass] [75421.280725] [<ffffffffc08a081c>] class_process_config+0x1b5c/0x2810 [obdclass] [75421.282654] [<ffffffffc0891219>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [75421.284527] [<ffffffffc087a71a>] class_handle_ioctl+0x56a/0x1de0 [obdclass] [75421.286438] [<ffffffff8122239b>] ? destroy_inode+0x3b/0x60 [75421.288202] [<ffffffff812b72be>] ? security_capable+0x1e/0x20 [75421.290020] [<ffffffffc08607f2>] obd_class_ioctl+0xd2/0x170 [obdclass] [75421.291824] [<ffffffff81219e90>] do_vfs_ioctl+0x350/0x560 [75421.293554] [<ffffffff812275e4>] ? mntput+0x24/0x40 [75421.295289] [<ffffffff812079b6>] ? __fput+0x186/0x260 [75421.296973] [<ffffffff8121a141>] SyS_ioctl+0xa1/0xc0 [75421.298680] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146 [75421.300454] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21 [75421.302229] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146 [75421.304033] [75421.305435] Kernel panic - not syncing: LBUG [75421.306428] CPU: 1 PID: 14767 Comm: lctl Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1 [75421.306428] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [75421.306428] Call Trace: [75421.306428] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b [75421.306428] [<ffffffff816a8634>] panic+0xe8/0x21f [75421.306428] [<ffffffffc06ae854>] lbug_with_loc+0x64/0xb0 [libcfs] [75421.306428] [<ffffffffc08c867f>] lprocfs_alloc_obd_stats+0x18f/0x200 [obdclass] [75421.306428] [<ffffffffc088b920>] ? lprocfs_obd_setup+0x130/0x320 [obdclass] [75421.306428] [<ffffffffc10a8ced>] echo_setup+0x1bd/0x2c0 [obdecho] [75421.306428] [<ffffffffc0ab6810>] ? ldlm_expired_completion_wait+0x220/0x220 [ptlrpc] [75421.306428] [<ffffffffc089bfda>] obd_setup+0x18a/0x2b0 [obdclass] [75421.306428] [<ffffffffc06c0a66>] ? cfs_hash_create+0x366/0xa20 [libcfs] [75421.306428] [<ffffffffc089c3a8>] class_setup+0x2a8/0x840 [obdclass] [75421.306428] [<ffffffffc08a081c>] class_process_config+0x1b5c/0x2810 [obdclass] [75421.306428] [<ffffffffc0891219>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [75421.306428] [<ffffffffc087a71a>] class_handle_ioctl+0x56a/0x1de0 [obdclass] [75421.306428] [<ffffffff8122239b>] ? destroy_inode+0x3b/0x60 [75421.306428] [<ffffffff812b72be>] ? security_capable+0x1e/0x20 [75421.306428] [<ffffffffc08607f2>] obd_class_ioctl+0xd2/0x170 [obdclass] [75421.306428] [<ffffffff81219e90>] do_vfs_ioctl+0x350/0x560 [75421.306428] [<ffffffff812275e4>] ? mntput+0x24/0x40 [75421.306428] [<ffffffff812079b6>] ? __fput+0x186/0x260 [75421.306428] [<ffffffff8121a141>] SyS_ioctl+0xa1/0xc0 [75421.306428] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146 [75421.306428] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21 [75421.306428] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146 [ 0.000000] Initializing cgroup subsys cpuset

            Looking at the assertion, this appears like it is related to changes in /proc, possibly related to /sysfs changes? Possibly James already has a patch to fix this part of the code?

            adilger Andreas Dilger added a comment - Looking at the assertion, this appears like it is related to changes in /proc, possibly related to /sysfs changes? Possibly James already has a patch to fix this part of the code?
            pjones Peter Jones added a comment -

            Mike

            Can you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Mike Can you please investigate? Thanks Peter

            This was not observed in the latest master test results because of test_3a was not functional, see LU-7420 :

            > this test passes at Maloo, seems it does just nothing.

            LU-7420 patch https://review.whamcloud.com/#/c/18443/ fixes the test to be executed. The reported issue hit with this patch.

            egryaznova Elena Gryaznova added a comment - This was not observed in the latest master test results because of test_3a was not functional, see LU-7420 : > this test passes at Maloo, seems it does just nothing. LU-7420 patch https://review.whamcloud.com/#/c/18443/  fixes the test to be executed. The reported issue hit with this patch.
            green Oleg Drokin added a comment -

            hm, so that's all after Mar 6th round of landings, I guess.

            green Oleg Drokin added a comment - hm, so that's all after Mar 6th round of landings, I guess.

            I've reviewed all obdfilter-survey test 3a hangs over all branches starting from January 1, 2018 to today and I've found the following test sessions that have the same assertion as in this ticket:

            2018-03-09 22:59:29 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/67d21086-240d-11e8-b74b-52540065bddc

            2018-03-09 23:53:56 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/6d7aa550-240f-11e8-b3c6-52540065bddc

            2018-03-10 02:13:04 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/8d7b15dc-242e-11e8-b74b-52540065bddc

            2018-03-09 00:54:41 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/7c8f22cc-2348-11e8-9852-52540065bddc

            2018-03-08 18:53:42 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/965a62b4-231d-11e8-9852-52540065bddc

            2018-03-08 21:41:26 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/15d7653a-233e-11e8-8d2f-52540065bddc

            2018-03-08 22:23:08 UTC full-patchless – https://testing.hpdd.intel.com/test_sets/0f8acdfa-2340-11e8-88de-52540065bddc

            All are from full test sessions for our patchless Lustre builds/testing.

            jamesanunez James Nunez (Inactive) added a comment - I've reviewed all obdfilter-survey test 3a hangs over all branches starting from January 1, 2018 to today and I've found the following test sessions that have the same assertion as in this ticket: 2018-03-09 22:59:29 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/67d21086-240d-11e8-b74b-52540065bddc 2018-03-09 23:53:56 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/6d7aa550-240f-11e8-b3c6-52540065bddc 2018-03-10 02:13:04 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/8d7b15dc-242e-11e8-b74b-52540065bddc 2018-03-09 00:54:41 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/7c8f22cc-2348-11e8-9852-52540065bddc 2018-03-08 18:53:42 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/965a62b4-231d-11e8-9852-52540065bddc 2018-03-08 21:41:26 UTC full-patchless - https://testing.hpdd.intel.com/test_sets/15d7653a-233e-11e8-8d2f-52540065bddc 2018-03-08 22:23:08 UTC full-patchless – https://testing.hpdd.intel.com/test_sets/0f8acdfa-2340-11e8-88de-52540065bddc All are from full test sessions for our patchless Lustre builds/testing.
            green Oleg Drokin added a comment -

            The report referenced is for a gerrit patch that is based on another patch that seems to change relevant functionality. We looked and this sort of failure was not observed in the latest master test results, so the primary suspicion is it's the patch in question that broke things, esp. since if you look, the baseline patch is "move procfs stuff to debugfs" and the assertion you hit is in procfs-related code.

            green Oleg Drokin added a comment - The report referenced is for a gerrit patch that is based on another patch that seems to change relevant functionality. We looked and this sort of failure was not observed in the latest master test results, so the primary suspicion is it's the patch in question that broke things, esp. since if you look, the baseline patch is "move procfs stuff to debugfs" and the assertion you hit is in procfs-related code.
            spitzcor Cory Spitz added a comment -

            Obdfilter-survey ought to work very reliably.  We should consider this a major issue (or greater).

            spitzcor Cory Spitz added a comment - Obdfilter-survey ought to work very reliably.  We should consider this a major issue (or greater).

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: