Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9485

conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • FSTYPE=zfs
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for John Hammond <john.hammond@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2d9acc98-3546-11e7-b0a8-5254006e85c2.

      The sub-test test_46a failed with the following error:

      test failed to respond and timed out
      

      From the client:

      01:20:15:[ 8727.094028] Lustre: Mounted lustre-client
      01:20:15:[ 8727.326631] LustreError: 14973:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
      01:20:15:[ 8727.329204] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
      01:20:15:[ 8727.331709] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
      01:20:15:[ 8727.334195] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
      01:20:15:[ 8727.336551] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
      01:20:15:[ 8727.338877] LustreError: 14973:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
      01:20:15:[ 8727.343166] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
      01:20:15:[ 8727.345641] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
      01:20:15:[ 8727.348135] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
      01:20:15:[ 8727.350427] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
      01:20:15:[ 8727.352806] LustreError: 14973:0:(llite_lib.c:2256:ll_prep_inode()) new_inode -fatal: rc -22
      01:20:15:[ 8727.362035] BUG: unable to handle kernel paging request at ffffffa0c9b0b0ff
      01:20:15:[ 8727.363006] IP: [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:15:[ 8727.363006] PGD 19bd067 PUD 0 
      01:20:15:[ 8727.363006] Oops: 0010 [#1] SMP 
      01:20:15:[ 8727.363006] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel parport_pc parport lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon i2c_piix4 pcspkr nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper 8139too ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel serio_raw ttm virtio_pci virtio_ring virtio drm 8139cp i2c_core mii libata floppy [last unloaded: libcfs]
      01:20:15:[ 8727.363006] CPU: 0 PID: 14979 Comm: bash Tainted: G        W  OE  ------------   3.10.0-514.16.1.el7.x86_64 #1
      01:20:15:[ 8727.363006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      01:20:15:[ 8727.363006] task: ffff88007ae76dd0 ti: ffff88007a240000 task.ti: ffff88007a240000
      01:20:15:[ 8727.363006] RIP: 0010:[<ffffffa0c9b0b0ff>]  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:15:[ 8727.363006] RSP: 0018:ffff88007fc03e78  EFLAGS: 00010286
      01:20:15:[ 8727.363006] RAX: ffffffa0c9b0b0ff RBX: ffffffff81a1a5c0 RCX: 0000000000019b00
      01:20:15:[ 8727.363006] RDX: ffff880067b8caa7 RSI: 0000000000000003 RDI: ffff880067b8caa7
      01:20:15:[ 8727.363006] RBP: ffff88007fc03ed0 R08: ffff88007b7b7e00 R09: 000000018010000f
      01:20:15:[ 8727.363006] R10: 0000000000000001 R11: ffffea0001ededc0 R12: 000000000000000a
      01:20:15:[ 8727.363006] R13: 0000000000000001 R14: ff8800658b5fb0ff R15: ffff88007fc101c0
      01:20:15:[ 8727.363006] FS:  00007f4f234d3740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      01:20:15:[ 8727.363006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      01:20:15:[ 8727.363006] CR2: ffffffa0c9b0b0ff CR3: 00000000782bb000 CR4: 00000000000406f0
      01:20:15:[ 8727.363006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      01:20:15:[ 8727.363006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      01:20:15:[ 8727.363006] Stack:
      01:20:15:[ 8727.363006]  ffffffff81139bbd ffff880066441800 0000000000000001 ffff88007fc101e8
      01:20:15:[ 8727.363006]  ffff88007ae76dd0 ffff88007a243fd8 0000000000000009 0000000000000009
      01:20:15:[ 8727.363006]  ffff880067d22f58 ffffffff819b3108 0000000000000001 ffff88007fc03f40
      01:20:15:[ 8727.363006] Call Trace:
      01:20:15:[ 8727.363006]  <IRQ> 
      01:20:15:[ 8727.363006]  [<ffffffff81139bbd>] ? rcu_process_callbacks+0x1dd/0x550
      01:20:15:[ 8727.363006]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
      01:20:15:[ 8727.363006]  [<ffffffff81698c1c>] call_softirq+0x1c/0x30
      01:20:15:[ 8727.363006]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
      01:20:15:[ 8727.363006]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
      01:20:15:[ 8727.363006]  [<ffffffff81699895>] smp_apic_timer_interrupt+0x45/0x60
      01:20:15:[ 8727.363006]  [<ffffffff81697ddd>] apic_timer_interrupt+0x6d/0x80
      01:20:15:[ 8727.363006]  <EOI> 
      01:20:15:[ 8727.363006]  [<ffffffff8121ccf3>] ? dup_fd+0x153/0x2d0
      01:20:15:[ 8727.363006]  [<ffffffff8121ccbf>] ? dup_fd+0x11f/0x2d0
      01:20:15:[ 8727.363006]  [<ffffffff81083f70>] copy_process+0xc10/0x1960
      01:20:15:[ 8727.363006]  [<ffffffff81084e71>] do_fork+0x91/0x2c0
      01:20:16:[ 8727.363006]  [<ffffffff81085126>] SyS_clone+0x16/0x20
      01:20:16:[ 8727.363006]  [<ffffffff816974d9>] stub_clone+0x69/0x90
      01:20:16:[ 8727.363006]  [<ffffffff81697189>] ? system_call_fastpath+0x16/0x1b
      01:20:16:[ 8727.363006] Code:  Bad RIP value.
      01:20:16:[ 8727.363006] RIP  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:16:[ 8727.363006]  RSP <ffff88007fc03e78>
      01:20:16:[ 8727.363006] CR2: ffffffa0c9b0b0ff
      

      https://testing.hpdd.intel.com/test_logs/2f0f1c78-3546-11e7-b0a8-5254006e85c2/show_text

      Info required for matching: conf-sanity 46a

      Attachments

        Activity

          [LU-9485] conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object
          sarah Sarah Liu added a comment -

          Bobi,

          Your debug patch is based on Jinshan's https://review.whamcloud.com/#/c/27148/, with his patch, it is not easy to reproduce. So I added the line back to test-framework.sh and reproduce it.

          running: conf-sanity ONLY=46a 
          run_suite conf-sanity /usr/lib64/lustre/tests/conf-sanity.sh
          [15069.537681] Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Jun 15 21:38:44 UTC 2017
          [15069.902165] Lustre: DEBUG MARKER: excepting tests: 32newtarball 101
          [15070.034791] Lustre: Unmounted lustre-client
          [15119.059428] Key type lgssc unregistered
          [15119.059933] Lustre: 18766:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism
          [15124.614091] LNet: Removed LNI 10.9.6.136@tcp
          [15125.631080] LNet: HW nodes: 1, HW CPU cores: 4, npartitions: 1
          [15125.634023] alg: No test for adler32 (adler32-zlib)
          [15125.634592] alg: No test for crc32 (crc32-table)
          [15126.534615] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:39:40 (1497562780)
          [15128.207906] Lustre: Lustre: Build Version: 2.9.59_16_g233342f
          [15128.232085] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
          [15128.232728] LNet: Accept secure, port 988
          [15129.739865] Lustre: 19557:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
          [15129.741378] Lustre: 19557:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism
          [15129.742225] Key type lgssc registered
          [15129.794743] Lustre: Echo OBD driver; http://www.lustre.org/
          [15167.777159] Lustre: Mounted lustre-client
          [15173.996457] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid 40
          [15174.200433] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid in FULL state after 0 sec
          [15174.277151] Lustre: Mounted lustre-client
          [15174.378436] LustreError: 21918:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
          [15174.379473] Lustre: 21918:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
          [15174.380433] Lustre: 21918:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
          [15174.381483] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
          [15174.382314] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
          [15174.383184] LustreError: 21918:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
          [15174.384297] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: 
          [15174.385191] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG
          
          Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ...
           kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: 
          
          Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ...
           kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG
          [15174.387462] Pid: 21918, comm: stat
          [15174.387797] 
          [15174.387797] Call Trace:
          [15174.388182]  [<ffffffffa05467ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
          [15174.388817]  [<ffffffffa054687c>] lbug_with_loc+0x4c/0xb0 [libcfs]
          [15174.389508]  [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre]
          [15174.390133]  [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50
          [15174.390717]  [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre]
          [15174.391397]  [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre]
          [15174.391956]  [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre]
          [15174.392599]  [<ffffffffa08cb580>] ? lustre_msg_buf+0x0/0x60 [ptlrpc]
          [15174.393298]  [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre]
          [15174.393957]  [<ffffffffa08cf4e0>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc]
          [15174.394632]  [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre]
          [15174.395373]  [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv]
          [15174.395998]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.396518]  [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre]
          [15174.397147]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.397660]  [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre]
          [15174.398373]  [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre]
          [15174.398983]  [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre]
          [15174.399579]  [<ffffffff81208add>] lookup_real+0x1d/0x50
          [15174.400159]  [<ffffffff81209452>] __lookup_hash+0x42/0x60
          [15174.400684]  [<ffffffff816841f2>] lookup_slow+0x42/0xa7
          [15174.401252]  [<ffffffff8120c973>] path_lookupat+0x773/0x7a0
          [15174.401783]  [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0
          [15174.402346]  [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0
          [15174.402945]  [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0
          [15174.403480]  [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0
          [15174.404092]  [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0
          [15174.404623]  [<ffffffff81210451>] user_path_at+0x11/0x20
          [15174.405174]  [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0
          [15174.405679]  [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60
          [15174.406259]  [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150
          [15174.406853]  [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0
          [15174.407435]  [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30
          [15174.408068]  [<ffffffff8120411e>] SyS_newlstat+0xe/0x10
          [15174.408581]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
          [15174.409197] 
          [15174.409364] Kernel panic - not syncing: LBUG
          [15174.409818] CPU: 3 PID: 21918 Comm: stat Tainted: G           OE  ------------   3.10.0-514.21.1.el7.x86_64 #1
          [15174.410752] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
          [15174.411297]  ffffffffa0564e8b 000000000694263c ffff880081abf820 ffffffff81686f13
          [15174.412073]  ffff880081abf8a0 ffffffff8168031a ffffffff00000008 ffff880081abf8b0
          [15174.412848]  ffff880081abf850 000000000694263c 000000000694263c ffff88013fd8f838
          [15174.413621] Call Trace:
          [15174.413871]  [<ffffffff81686f13>] dump_stack+0x19/0x1b
          [15174.414358]  [<ffffffff8168031a>] panic+0xe3/0x1f2
          [15174.414825]  [<ffffffffa0546894>] lbug_with_loc+0x64/0xb0 [libcfs]
          [15174.415415]  [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre]
          [15174.416038]  [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50
          [15174.416601]  [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre]
          [15174.417216]  [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre]
          [15174.417773]  [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre]
          [15174.418414]  [<ffffffffa08cb580>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc]
          [15174.419074]  [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
          [15174.419852]  [<ffffffffa08cf4e0>] ? lustre_swab_generic_32s+0x20/0x20 [ptlrpc]
          [15174.420543]  [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre]
          [15174.421200]  [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv]
          [15174.421830]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.422328]  [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre]
          [15174.422904]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.423403]  [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
          [15174.424179]  [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre]
          [15174.424789]  [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre]
          [15174.425368]  [<ffffffff81208add>] lookup_real+0x1d/0x50
          [15174.425864]  [<ffffffff81209452>] __lookup_hash+0x42/0x60
          [15174.426378]  [<ffffffff816841f2>] lookup_slow+0x42/0xa7
          [15174.426878]  [<ffffffff8120c973>] path_lookupat+0x773/0x7a0
          [15174.427409]  [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0
          [15174.427971]  [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0
          [15174.428504]  [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0
          [15174.429036]  [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0
          [15174.429587]  [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0
          [15174.430112]  [<ffffffff81210451>] user_path_at+0x11/0x20
          [15174.430631]  [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0
          [15174.431135]  [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60
          [15174.431648]  [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150
          [15174.432238]  [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0
          [15174.432815]  [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30
          [15174.433365]  [<ffffffff8120411e>] SyS_newlstat+0xe/0x10
          [15174.433864]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
          [    0.000000] Initializing cgroup subsys cpuset
          [    0.000000] Initializing cgroup subsys cpu
          [    0.000000] Initializing cgroup subsys cpuacct
          [    0.000000] Linux version 3.10.0-514.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Thu May 25 17:04:51 UTC 2017
          [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.21.1.el7.x86_64 root=UUID=f513f02d-54c7-42d0-bc3f-1a4b0d47ac07 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
          
          sarah Sarah Liu added a comment - Bobi, Your debug patch is based on Jinshan's https://review.whamcloud.com/#/c/27148/ , with his patch, it is not easy to reproduce. So I added the line back to test-framework.sh and reproduce it. running: conf-sanity ONLY=46a run_suite conf-sanity /usr/lib64/lustre/tests/conf-sanity.sh [15069.537681] Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Jun 15 21:38:44 UTC 2017 [15069.902165] Lustre: DEBUG MARKER: excepting tests: 32newtarball 101 [15070.034791] Lustre: Unmounted lustre-client [15119.059428] Key type lgssc unregistered [15119.059933] Lustre: 18766:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism [15124.614091] LNet: Removed LNI 10.9.6.136@tcp [15125.631080] LNet: HW nodes: 1, HW CPU cores: 4, npartitions: 1 [15125.634023] alg: No test for adler32 (adler32-zlib) [15125.634592] alg: No test for crc32 (crc32-table) [15126.534615] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:39:40 (1497562780) [15128.207906] Lustre: Lustre: Build Version: 2.9.59_16_g233342f [15128.232085] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180] [15128.232728] LNet: Accept secure, port 988 [15129.739865] Lustre: 19557:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active [15129.741378] Lustre: 19557:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism [15129.742225] Key type lgssc registered [15129.794743] Lustre: Echo OBD driver; http://www.lustre.org/ [15167.777159] Lustre: Mounted lustre-client [15173.996457] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid 40 [15174.200433] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid in FULL state after 0 sec [15174.277151] Lustre: Mounted lustre-client [15174.378436] LustreError: 21918:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 [15174.379473] Lustre: 21918:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1 [15174.380433] Lustre: 21918:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0 [15174.381483] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34 [15174.382314] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2 [15174.383184] LustreError: 21918:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22 [15174.384297] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: [15174.385191] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ... kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ... kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG [15174.387462] Pid: 21918, comm: stat [15174.387797] [15174.387797] Call Trace: [15174.388182] [<ffffffffa05467ee>] libcfs_call_trace+0x4e/0x60 [libcfs] [15174.388817] [<ffffffffa054687c>] lbug_with_loc+0x4c/0xb0 [libcfs] [15174.389508] [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre] [15174.390133] [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50 [15174.390717] [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre] [15174.391397] [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre] [15174.391956] [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre] [15174.392599] [<ffffffffa08cb580>] ? lustre_msg_buf+0x0/0x60 [ptlrpc] [15174.393298] [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre] [15174.393957] [<ffffffffa08cf4e0>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [15174.394632] [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre] [15174.395373] [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv] [15174.395998] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.396518] [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre] [15174.397147] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.397660] [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre] [15174.398373] [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre] [15174.398983] [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre] [15174.399579] [<ffffffff81208add>] lookup_real+0x1d/0x50 [15174.400159] [<ffffffff81209452>] __lookup_hash+0x42/0x60 [15174.400684] [<ffffffff816841f2>] lookup_slow+0x42/0xa7 [15174.401252] [<ffffffff8120c973>] path_lookupat+0x773/0x7a0 [15174.401783] [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0 [15174.402346] [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0 [15174.402945] [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0 [15174.403480] [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0 [15174.404092] [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0 [15174.404623] [<ffffffff81210451>] user_path_at+0x11/0x20 [15174.405174] [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0 [15174.405679] [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60 [15174.406259] [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150 [15174.406853] [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0 [15174.407435] [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30 [15174.408068] [<ffffffff8120411e>] SyS_newlstat+0xe/0x10 [15174.408581] [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b [15174.409197] [15174.409364] Kernel panic - not syncing: LBUG [15174.409818] CPU: 3 PID: 21918 Comm: stat Tainted: G OE ------------ 3.10.0-514.21.1.el7.x86_64 #1 [15174.410752] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [15174.411297] ffffffffa0564e8b 000000000694263c ffff880081abf820 ffffffff81686f13 [15174.412073] ffff880081abf8a0 ffffffff8168031a ffffffff00000008 ffff880081abf8b0 [15174.412848] ffff880081abf850 000000000694263c 000000000694263c ffff88013fd8f838 [15174.413621] Call Trace: [15174.413871] [<ffffffff81686f13>] dump_stack+0x19/0x1b [15174.414358] [<ffffffff8168031a>] panic+0xe3/0x1f2 [15174.414825] [<ffffffffa0546894>] lbug_with_loc+0x64/0xb0 [libcfs] [15174.415415] [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre] [15174.416038] [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50 [15174.416601] [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre] [15174.417216] [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre] [15174.417773] [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre] [15174.418414] [<ffffffffa08cb580>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] [15174.419074] [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre] [15174.419852] [<ffffffffa08cf4e0>] ? lustre_swab_generic_32s+0x20/0x20 [ptlrpc] [15174.420543] [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre] [15174.421200] [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv] [15174.421830] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.422328] [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre] [15174.422904] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.423403] [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre] [15174.424179] [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre] [15174.424789] [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre] [15174.425368] [<ffffffff81208add>] lookup_real+0x1d/0x50 [15174.425864] [<ffffffff81209452>] __lookup_hash+0x42/0x60 [15174.426378] [<ffffffff816841f2>] lookup_slow+0x42/0xa7 [15174.426878] [<ffffffff8120c973>] path_lookupat+0x773/0x7a0 [15174.427409] [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0 [15174.427971] [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0 [15174.428504] [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0 [15174.429036] [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0 [15174.429587] [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0 [15174.430112] [<ffffffff81210451>] user_path_at+0x11/0x20 [15174.430631] [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0 [15174.431135] [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60 [15174.431648] [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150 [15174.432238] [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0 [15174.432815] [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30 [15174.433365] [<ffffffff8120411e>] SyS_newlstat+0xe/0x10 [15174.433864] [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-514.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Thu May 25 17:04:51 UTC 2017 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.21.1.el7.x86_64 root=UUID=f513f02d-54c7-42d0-bc3f-1a4b0d47ac07 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
          sarah Sarah Liu added a comment -

          Sure, I will update the ticket if I get sth.

          sarah Sarah Liu added a comment - Sure, I will update the ticket if I get sth.
          bobijam Zhenyu Xu added a comment -

          Hi Sarah,

          Can you try to reproduce this issue again with this debug patch? I want to know whether it has something to do with the memory corruption, if the reproduction fails the assertion which I adds, then I'll make another fix patch.

          bobijam Zhenyu Xu added a comment - Hi Sarah, Can you try to reproduce this issue again with this debug patch? I want to know whether it has something to do with the memory corruption, if the reproduction fails the assertion which I adds, then I'll make another fix patch.

          Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27658
          Subject: LU-9485 debug: check a suspicion
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 233342f4850ce7dea0c5026b5d09b0f8dc773968

          gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/27658 Subject: LU-9485 debug: check a suspicion Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 233342f4850ce7dea0c5026b5d09b0f8dc773968
          sarah Sarah Liu added a comment -

          this issue is easy to reproduce on zfs.

          sarah Sarah Liu added a comment - this issue is easy to reproduce on zfs.
          bobijam Zhenyu Xu added a comment -

          FYI, tried several error injections on my local VM (ldiskfs backend), cannot trigger the memory corruption yet.

          bobijam Zhenyu Xu added a comment - FYI, tried several error injections on my local VM (ldiskfs backend), cannot trigger the memory corruption yet.
          jay Jinshan Xiong (Inactive) added a comment - - edited

          Oleg - if this were double free, wouldn't it be caught at the 2nd time of call_rcu()?

          jay Jinshan Xiong (Inactive) added a comment - - edited Oleg - if this were double free, wouldn't it be caught at the 2nd time of call_rcu() ?
          green Oleg Drokin added a comment -

          There's interesting message I see in the above dmesg dump:

          [ 1640.994159] ODEBUG: active_state not available (active state 0) object type: rcu_head hint:           (null)
          

          I guess that means a double free or something similar?

          green Oleg Drokin added a comment - There's interesting message I see in the above dmesg dump: [ 1640.994159] ODEBUG: active_state not available (active state 0) object type: rcu_head hint: ( null ) I guess that means a double free or something similar?

          I found something about this issue.

          It turned out the previous data structure on the RCU list was as follows:

          [ 3782.106995] next = ffff8800afe2d2a7, list = ffff88003553dd00, list func:           (null)
          [ 3782.107782] next list: next = ff88007085b2f0ff, list = ffff8800afe2d2a7, list_func = ffffffa0b53fe0ff
          

          Unfortunately it doesn't come with a free function that means it should be freed by free_rcu. I also turned on malloc debug log but it seems this piece of memory has never been used by Lustre. It's not the corresponding dentry that failed inode initialization either.

          jay Jinshan Xiong (Inactive) added a comment - I found something about this issue. It turned out the previous data structure on the RCU list was as follows: [ 3782.106995] next = ffff8800afe2d2a7, list = ffff88003553dd00, list func: (null) [ 3782.107782] next list: next = ff88007085b2f0ff, list = ffff8800afe2d2a7, list_func = ffffffa0b53fe0ff Unfortunately it doesn't come with a free function that means it should be freed by free_rcu . I also turned on malloc debug log but it seems this piece of memory has never been used by Lustre. It's not the corresponding dentry that failed inode initialization either.
          sarah Sarah Liu added a comment -

          please check crash dump on
          [root@trevis-58vm1 ~]# ls /var/crash/
          127.0.0.1-2017-06-05-22:42:02

          sarah Sarah Liu added a comment - please check crash dump on [root@trevis-58vm1 ~] # ls /var/crash/ 127.0.0.1-2017-06-05-22:42:02

          Sarah - as we discussed last Friday, please make crash dump work and enable MALLOC debug only.

          jay Jinshan Xiong (Inactive) added a comment - Sarah - as we discussed last Friday, please make crash dump work and enable MALLOC debug only.

          People

            jay Jinshan Xiong (Inactive)
            maloo Maloo
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: