Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9485

conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • FSTYPE=zfs
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for John Hammond <john.hammond@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2d9acc98-3546-11e7-b0a8-5254006e85c2.

      The sub-test test_46a failed with the following error:

      test failed to respond and timed out
      

      From the client:

      01:20:15:[ 8727.094028] Lustre: Mounted lustre-client
      01:20:15:[ 8727.326631] LustreError: 14973:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
      01:20:15:[ 8727.329204] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
      01:20:15:[ 8727.331709] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
      01:20:15:[ 8727.334195] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
      01:20:15:[ 8727.336551] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
      01:20:15:[ 8727.338877] LustreError: 14973:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
      01:20:15:[ 8727.343166] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
      01:20:15:[ 8727.345641] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
      01:20:15:[ 8727.348135] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
      01:20:15:[ 8727.350427] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
      01:20:15:[ 8727.352806] LustreError: 14973:0:(llite_lib.c:2256:ll_prep_inode()) new_inode -fatal: rc -22
      01:20:15:[ 8727.362035] BUG: unable to handle kernel paging request at ffffffa0c9b0b0ff
      01:20:15:[ 8727.363006] IP: [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:15:[ 8727.363006] PGD 19bd067 PUD 0 
      01:20:15:[ 8727.363006] Oops: 0010 [#1] SMP 
      01:20:15:[ 8727.363006] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel parport_pc parport lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon i2c_piix4 pcspkr nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper 8139too ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel serio_raw ttm virtio_pci virtio_ring virtio drm 8139cp i2c_core mii libata floppy [last unloaded: libcfs]
      01:20:15:[ 8727.363006] CPU: 0 PID: 14979 Comm: bash Tainted: G        W  OE  ------------   3.10.0-514.16.1.el7.x86_64 #1
      01:20:15:[ 8727.363006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      01:20:15:[ 8727.363006] task: ffff88007ae76dd0 ti: ffff88007a240000 task.ti: ffff88007a240000
      01:20:15:[ 8727.363006] RIP: 0010:[<ffffffa0c9b0b0ff>]  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:15:[ 8727.363006] RSP: 0018:ffff88007fc03e78  EFLAGS: 00010286
      01:20:15:[ 8727.363006] RAX: ffffffa0c9b0b0ff RBX: ffffffff81a1a5c0 RCX: 0000000000019b00
      01:20:15:[ 8727.363006] RDX: ffff880067b8caa7 RSI: 0000000000000003 RDI: ffff880067b8caa7
      01:20:15:[ 8727.363006] RBP: ffff88007fc03ed0 R08: ffff88007b7b7e00 R09: 000000018010000f
      01:20:15:[ 8727.363006] R10: 0000000000000001 R11: ffffea0001ededc0 R12: 000000000000000a
      01:20:15:[ 8727.363006] R13: 0000000000000001 R14: ff8800658b5fb0ff R15: ffff88007fc101c0
      01:20:15:[ 8727.363006] FS:  00007f4f234d3740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      01:20:15:[ 8727.363006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      01:20:15:[ 8727.363006] CR2: ffffffa0c9b0b0ff CR3: 00000000782bb000 CR4: 00000000000406f0
      01:20:15:[ 8727.363006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      01:20:15:[ 8727.363006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      01:20:15:[ 8727.363006] Stack:
      01:20:15:[ 8727.363006]  ffffffff81139bbd ffff880066441800 0000000000000001 ffff88007fc101e8
      01:20:15:[ 8727.363006]  ffff88007ae76dd0 ffff88007a243fd8 0000000000000009 0000000000000009
      01:20:15:[ 8727.363006]  ffff880067d22f58 ffffffff819b3108 0000000000000001 ffff88007fc03f40
      01:20:15:[ 8727.363006] Call Trace:
      01:20:15:[ 8727.363006]  <IRQ> 
      01:20:15:[ 8727.363006]  [<ffffffff81139bbd>] ? rcu_process_callbacks+0x1dd/0x550
      01:20:15:[ 8727.363006]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
      01:20:15:[ 8727.363006]  [<ffffffff81698c1c>] call_softirq+0x1c/0x30
      01:20:15:[ 8727.363006]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
      01:20:15:[ 8727.363006]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
      01:20:15:[ 8727.363006]  [<ffffffff81699895>] smp_apic_timer_interrupt+0x45/0x60
      01:20:15:[ 8727.363006]  [<ffffffff81697ddd>] apic_timer_interrupt+0x6d/0x80
      01:20:15:[ 8727.363006]  <EOI> 
      01:20:15:[ 8727.363006]  [<ffffffff8121ccf3>] ? dup_fd+0x153/0x2d0
      01:20:15:[ 8727.363006]  [<ffffffff8121ccbf>] ? dup_fd+0x11f/0x2d0
      01:20:15:[ 8727.363006]  [<ffffffff81083f70>] copy_process+0xc10/0x1960
      01:20:15:[ 8727.363006]  [<ffffffff81084e71>] do_fork+0x91/0x2c0
      01:20:16:[ 8727.363006]  [<ffffffff81085126>] SyS_clone+0x16/0x20
      01:20:16:[ 8727.363006]  [<ffffffff816974d9>] stub_clone+0x69/0x90
      01:20:16:[ 8727.363006]  [<ffffffff81697189>] ? system_call_fastpath+0x16/0x1b
      01:20:16:[ 8727.363006] Code:  Bad RIP value.
      01:20:16:[ 8727.363006] RIP  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:16:[ 8727.363006]  RSP <ffff88007fc03e78>
      01:20:16:[ 8727.363006] CR2: ffffffa0c9b0b0ff
      

      https://testing.hpdd.intel.com/test_logs/2f0f1c78-3546-11e7-b0a8-5254006e85c2/show_text

      Info required for matching: conf-sanity 46a

      Attachments

        Activity

          [LU-9485] conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27658/
          Subject: LU-9485 llite: check the return value of cl_file_inode_init()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: ca41d88d9228e469a8728c20c21c7e919b77dc12

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27658/ Subject: LU-9485 llite: check the return value of cl_file_inode_init() Project: fs/lustre-release Branch: master Current Patch Set: Commit: ca41d88d9228e469a8728c20c21c7e919b77dc12
          bobijam Zhenyu Xu added a comment -

          yes, patchset 2 should be the fix patch, I'll update it and request reviews for future landing.

          bobijam Zhenyu Xu added a comment - yes, patchset 2 should be the fix patch, I'll update it and request reviews for future landing.
          pjones Peter Jones added a comment -

          Bobijam

          So patchset 2 was more than just a diagnostic patch - it was a possible (and now proven) fix? Does that mean that we should rollback to v2 of the patch and get reviews with a view to landing this patch?

          Peter

          pjones Peter Jones added a comment - Bobijam So patchset 2 was more than just a diagnostic patch - it was a possible (and now proven) fix? Does that mean that we should rollback to v2 of the patch and get reviews with a view to landing this patch? Peter
          bobijam Zhenyu Xu added a comment -

          that's a good news, meaning that the patch fixes the memory issue.

          bobijam Zhenyu Xu added a comment - that's a good news, meaning that the patch fixes the memory issue.
          sarah Sarah Liu added a comment - - edited

          I have run patchset #2 locally with the line added several times and cannot reproduce it this time, but the test failed as
          https://testing.hpdd.intel.com/test_sets/3cec9368-52e0-11e7-8a1b-5254006e85c2

          [19954.374299] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:51:35 (1497649895)
          [19956.043116] Lustre: Lustre: Build Version: 2.9.59_16_gbe2e8e5
          [19956.068256] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
          [19956.068913] LNet: Accept secure, port 988
          [19957.576952] Lustre: 2814:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
          [19957.578575] Lustre: 2814:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism
          [19957.579450] Key type lgssc registered
          [19957.631952] Lustre: Echo OBD driver; http://www.lustre.org/
          [19994.598809] Lustre: Mounted lustre-client
          [19999.823173] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid 40
          [20000.027082] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid in FULL state after 0 sec
          [20000.102424] Lustre: Mounted lustre-client
          [20000.202406] LustreError: 5182:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
          [20000.203449] Lustre: 5182:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
          [20000.204419] Lustre: 5182:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
          [20000.205502] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
          [20000.206337] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
          [20000.207177] LustreError: 5182:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
          [20000.208389] LustreError: 5182:0:(llite_lib.c:2273:ll_prep_inode()) new_inode -fatal: rc -22
          [20000.368646] Lustre: DEBUG MARKER: conf-sanity test_46a: @@@@@@ FAIL: stat /mnt/lustre/widestripe failed
          [20002.819582] LustreError: 5512:0:(obd_config.c:1429:class_process_proc_param()) lov.: error writing proc entry 'stripecount': rc = -34
          [20008.013856] Lustre: Unmounted lustre-client
          conf-sanity returned 0
          Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre (opts:)
          Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre2 (opts:)
          Stopping /mnt/lustre-mds1 (opts:-f) on trevis-73vm11
          [20031.824103] Key type lgssc unregistered
          [20031.824613] Lustre: 6228:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism
          [20037.824058] LNet: Removed LNI 10.9.6.136@tcp
          modules unloaded.
          Finished at Fri Jun 16 21:53:00 UTC 2017 in 176s
          auster: completed with rc 0
          
          
          sarah Sarah Liu added a comment - - edited I have run patchset #2 locally with the line added several times and cannot reproduce it this time, but the test failed as https://testing.hpdd.intel.com/test_sets/3cec9368-52e0-11e7-8a1b-5254006e85c2 [19954.374299] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:51:35 (1497649895) [19956.043116] Lustre: Lustre: Build Version: 2.9.59_16_gbe2e8e5 [19956.068256] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180] [19956.068913] LNet: Accept secure, port 988 [19957.576952] Lustre: 2814:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active [19957.578575] Lustre: 2814:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism [19957.579450] Key type lgssc registered [19957.631952] Lustre: Echo OBD driver; http://www.lustre.org/ [19994.598809] Lustre: Mounted lustre-client [19999.823173] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid 40 [20000.027082] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid in FULL state after 0 sec [20000.102424] Lustre: Mounted lustre-client [20000.202406] LustreError: 5182:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 [20000.203449] Lustre: 5182:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1 [20000.204419] Lustre: 5182:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0 [20000.205502] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34 [20000.206337] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2 [20000.207177] LustreError: 5182:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22 [20000.208389] LustreError: 5182:0:(llite_lib.c:2273:ll_prep_inode()) new_inode -fatal: rc -22 [20000.368646] Lustre: DEBUG MARKER: conf-sanity test_46a: @@@@@@ FAIL: stat /mnt/lustre/widestripe failed [20002.819582] LustreError: 5512:0:(obd_config.c:1429:class_process_proc_param()) lov.: error writing proc entry 'stripecount': rc = -34 [20008.013856] Lustre: Unmounted lustre-client conf-sanity returned 0 Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre (opts:) Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on trevis-73vm11 [20031.824103] Key type lgssc unregistered [20031.824613] Lustre: 6228:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism [20037.824058] LNet: Removed LNI 10.9.6.136@tcp modules unloaded. Finished at Fri Jun 16 21:53:00 UTC 2017 in 176s auster: completed with rc 0
          sarah Sarah Liu added a comment -

          I update the patch base on yours with the problematic line reinstated and adding test-parameter.

          sarah Sarah Liu added a comment - I update the patch base on yours with the problematic line reinstated and adding test-parameter.
          bobijam Zhenyu Xu added a comment -

          Would you please add the line back to test-framework.sh and apply the updated patch (patchset 2) and try to reproduce it? Thank you.

          bobijam Zhenyu Xu added a comment - Would you please add the line back to test-framework.sh and apply the updated patch (patchset 2) and try to reproduce it? Thank you.
          sarah Sarah Liu added a comment -

          Bobi,

          Your debug patch is based on Jinshan's https://review.whamcloud.com/#/c/27148/, with his patch, it is not easy to reproduce. So I added the line back to test-framework.sh and reproduce it.

          running: conf-sanity ONLY=46a 
          run_suite conf-sanity /usr/lib64/lustre/tests/conf-sanity.sh
          [15069.537681] Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Jun 15 21:38:44 UTC 2017
          [15069.902165] Lustre: DEBUG MARKER: excepting tests: 32newtarball 101
          [15070.034791] Lustre: Unmounted lustre-client
          [15119.059428] Key type lgssc unregistered
          [15119.059933] Lustre: 18766:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism
          [15124.614091] LNet: Removed LNI 10.9.6.136@tcp
          [15125.631080] LNet: HW nodes: 1, HW CPU cores: 4, npartitions: 1
          [15125.634023] alg: No test for adler32 (adler32-zlib)
          [15125.634592] alg: No test for crc32 (crc32-table)
          [15126.534615] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:39:40 (1497562780)
          [15128.207906] Lustre: Lustre: Build Version: 2.9.59_16_g233342f
          [15128.232085] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
          [15128.232728] LNet: Accept secure, port 988
          [15129.739865] Lustre: 19557:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
          [15129.741378] Lustre: 19557:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism
          [15129.742225] Key type lgssc registered
          [15129.794743] Lustre: Echo OBD driver; http://www.lustre.org/
          [15167.777159] Lustre: Mounted lustre-client
          [15173.996457] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid 40
          [15174.200433] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid in FULL state after 0 sec
          [15174.277151] Lustre: Mounted lustre-client
          [15174.378436] LustreError: 21918:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
          [15174.379473] Lustre: 21918:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
          [15174.380433] Lustre: 21918:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
          [15174.381483] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
          [15174.382314] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
          [15174.383184] LustreError: 21918:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
          [15174.384297] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: 
          [15174.385191] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG
          
          Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ...
           kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: 
          
          Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ...
           kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG
          [15174.387462] Pid: 21918, comm: stat
          [15174.387797] 
          [15174.387797] Call Trace:
          [15174.388182]  [<ffffffffa05467ee>] libcfs_call_trace+0x4e/0x60 [libcfs]
          [15174.388817]  [<ffffffffa054687c>] lbug_with_loc+0x4c/0xb0 [libcfs]
          [15174.389508]  [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre]
          [15174.390133]  [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50
          [15174.390717]  [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre]
          [15174.391397]  [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre]
          [15174.391956]  [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre]
          [15174.392599]  [<ffffffffa08cb580>] ? lustre_msg_buf+0x0/0x60 [ptlrpc]
          [15174.393298]  [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre]
          [15174.393957]  [<ffffffffa08cf4e0>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc]
          [15174.394632]  [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre]
          [15174.395373]  [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv]
          [15174.395998]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.396518]  [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre]
          [15174.397147]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.397660]  [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre]
          [15174.398373]  [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre]
          [15174.398983]  [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre]
          [15174.399579]  [<ffffffff81208add>] lookup_real+0x1d/0x50
          [15174.400159]  [<ffffffff81209452>] __lookup_hash+0x42/0x60
          [15174.400684]  [<ffffffff816841f2>] lookup_slow+0x42/0xa7
          [15174.401252]  [<ffffffff8120c973>] path_lookupat+0x773/0x7a0
          [15174.401783]  [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0
          [15174.402346]  [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0
          [15174.402945]  [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0
          [15174.403480]  [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0
          [15174.404092]  [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0
          [15174.404623]  [<ffffffff81210451>] user_path_at+0x11/0x20
          [15174.405174]  [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0
          [15174.405679]  [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60
          [15174.406259]  [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150
          [15174.406853]  [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0
          [15174.407435]  [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30
          [15174.408068]  [<ffffffff8120411e>] SyS_newlstat+0xe/0x10
          [15174.408581]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
          [15174.409197] 
          [15174.409364] Kernel panic - not syncing: LBUG
          [15174.409818] CPU: 3 PID: 21918 Comm: stat Tainted: G           OE  ------------   3.10.0-514.21.1.el7.x86_64 #1
          [15174.410752] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
          [15174.411297]  ffffffffa0564e8b 000000000694263c ffff880081abf820 ffffffff81686f13
          [15174.412073]  ffff880081abf8a0 ffffffff8168031a ffffffff00000008 ffff880081abf8b0
          [15174.412848]  ffff880081abf850 000000000694263c 000000000694263c ffff88013fd8f838
          [15174.413621] Call Trace:
          [15174.413871]  [<ffffffff81686f13>] dump_stack+0x19/0x1b
          [15174.414358]  [<ffffffff8168031a>] panic+0xe3/0x1f2
          [15174.414825]  [<ffffffffa0546894>] lbug_with_loc+0x64/0xb0 [libcfs]
          [15174.415415]  [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre]
          [15174.416038]  [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50
          [15174.416601]  [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre]
          [15174.417216]  [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre]
          [15174.417773]  [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre]
          [15174.418414]  [<ffffffffa08cb580>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc]
          [15174.419074]  [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
          [15174.419852]  [<ffffffffa08cf4e0>] ? lustre_swab_generic_32s+0x20/0x20 [ptlrpc]
          [15174.420543]  [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre]
          [15174.421200]  [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv]
          [15174.421830]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.422328]  [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre]
          [15174.422904]  [<ffffffff81114b82>] ? from_kgid+0x12/0x20
          [15174.423403]  [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre]
          [15174.424179]  [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre]
          [15174.424789]  [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre]
          [15174.425368]  [<ffffffff81208add>] lookup_real+0x1d/0x50
          [15174.425864]  [<ffffffff81209452>] __lookup_hash+0x42/0x60
          [15174.426378]  [<ffffffff816841f2>] lookup_slow+0x42/0xa7
          [15174.426878]  [<ffffffff8120c973>] path_lookupat+0x773/0x7a0
          [15174.427409]  [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0
          [15174.427971]  [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0
          [15174.428504]  [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0
          [15174.429036]  [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0
          [15174.429587]  [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0
          [15174.430112]  [<ffffffff81210451>] user_path_at+0x11/0x20
          [15174.430631]  [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0
          [15174.431135]  [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60
          [15174.431648]  [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150
          [15174.432238]  [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0
          [15174.432815]  [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30
          [15174.433365]  [<ffffffff8120411e>] SyS_newlstat+0xe/0x10
          [15174.433864]  [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b
          [    0.000000] Initializing cgroup subsys cpuset
          [    0.000000] Initializing cgroup subsys cpu
          [    0.000000] Initializing cgroup subsys cpuacct
          [    0.000000] Linux version 3.10.0-514.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Thu May 25 17:04:51 UTC 2017
          [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.21.1.el7.x86_64 root=UUID=f513f02d-54c7-42d0-bc3f-1a4b0d47ac07 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
          
          sarah Sarah Liu added a comment - Bobi, Your debug patch is based on Jinshan's https://review.whamcloud.com/#/c/27148/ , with his patch, it is not easy to reproduce. So I added the line back to test-framework.sh and reproduce it. running: conf-sanity ONLY=46a run_suite conf-sanity /usr/lib64/lustre/tests/conf-sanity.sh [15069.537681] Lustre: DEBUG MARKER: -----============= acceptance-small: conf-sanity ============----- Thu Jun 15 21:38:44 UTC 2017 [15069.902165] Lustre: DEBUG MARKER: excepting tests: 32newtarball 101 [15070.034791] Lustre: Unmounted lustre-client [15119.059428] Key type lgssc unregistered [15119.059933] Lustre: 18766:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism [15124.614091] LNet: Removed LNI 10.9.6.136@tcp [15125.631080] LNet: HW nodes: 1, HW CPU cores: 4, npartitions: 1 [15125.634023] alg: No test for adler32 (adler32-zlib) [15125.634592] alg: No test for crc32 (crc32-table) [15126.534615] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:39:40 (1497562780) [15128.207906] Lustre: Lustre: Build Version: 2.9.59_16_g233342f [15128.232085] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180] [15128.232728] LNet: Accept secure, port 988 [15129.739865] Lustre: 19557:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active [15129.741378] Lustre: 19557:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism [15129.742225] Key type lgssc registered [15129.794743] Lustre: Echo OBD driver; http://www.lustre.org/ [15167.777159] Lustre: Mounted lustre-client [15173.996457] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid 40 [15174.200433] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800b4796800.ost_server_uuid in FULL state after 0 sec [15174.277151] Lustre: Mounted lustre-client [15174.378436] LustreError: 21918:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 [15174.379473] Lustre: 21918:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1 [15174.380433] Lustre: 21918:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0 [15174.381483] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34 [15174.382314] Lustre: 21918:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2 [15174.383184] LustreError: 21918:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22 [15174.384297] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: [15174.385191] LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ... kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) ASSERTION( rc == 0 ) failed: Message from syslogd@trevis-58vm1 at Jun 15 21:40:29 ... kernel:LustreError: 21918:0:(llite_lib.c:1811:ll_update_inode()) LBUG [15174.387462] Pid: 21918, comm: stat [15174.387797] [15174.387797] Call Trace: [15174.388182] [<ffffffffa05467ee>] libcfs_call_trace+0x4e/0x60 [libcfs] [15174.388817] [<ffffffffa054687c>] lbug_with_loc+0x4c/0xb0 [libcfs] [15174.389508] [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre] [15174.390133] [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50 [15174.390717] [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre] [15174.391397] [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre] [15174.391956] [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre] [15174.392599] [<ffffffffa08cb580>] ? lustre_msg_buf+0x0/0x60 [ptlrpc] [15174.393298] [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre] [15174.393957] [<ffffffffa08cf4e0>] ? lustre_swab_mdt_body+0x0/0x140 [ptlrpc] [15174.394632] [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre] [15174.395373] [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv] [15174.395998] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.396518] [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre] [15174.397147] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.397660] [<ffffffffa0b73a20>] ? ll_md_blocking_ast+0x0/0x730 [lustre] [15174.398373] [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre] [15174.398983] [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre] [15174.399579] [<ffffffff81208add>] lookup_real+0x1d/0x50 [15174.400159] [<ffffffff81209452>] __lookup_hash+0x42/0x60 [15174.400684] [<ffffffff816841f2>] lookup_slow+0x42/0xa7 [15174.401252] [<ffffffff8120c973>] path_lookupat+0x773/0x7a0 [15174.401783] [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0 [15174.402346] [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0 [15174.402945] [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0 [15174.403480] [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0 [15174.404092] [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0 [15174.404623] [<ffffffff81210451>] user_path_at+0x11/0x20 [15174.405174] [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0 [15174.405679] [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60 [15174.406259] [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150 [15174.406853] [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0 [15174.407435] [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30 [15174.408068] [<ffffffff8120411e>] SyS_newlstat+0xe/0x10 [15174.408581] [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b [15174.409197] [15174.409364] Kernel panic - not syncing: LBUG [15174.409818] CPU: 3 PID: 21918 Comm: stat Tainted: G OE ------------ 3.10.0-514.21.1.el7.x86_64 #1 [15174.410752] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [15174.411297] ffffffffa0564e8b 000000000694263c ffff880081abf820 ffffffff81686f13 [15174.412073] ffff880081abf8a0 ffffffff8168031a ffffffff00000008 ffff880081abf8b0 [15174.412848] ffff880081abf850 000000000694263c 000000000694263c ffff88013fd8f838 [15174.413621] Call Trace: [15174.413871] [<ffffffff81686f13>] dump_stack+0x19/0x1b [15174.414358] [<ffffffff8168031a>] panic+0xe3/0x1f2 [15174.414825] [<ffffffffa0546894>] lbug_with_loc+0x64/0xb0 [libcfs] [15174.415415] [<ffffffffa0b6577f>] ll_update_inode+0x36f/0x620 [lustre] [15174.416038] [<ffffffff81219483>] ? inode_sb_list_add+0x43/0x50 [15174.416601] [<ffffffffa0b65a97>] ll_read_inode2+0x67/0x420 [lustre] [15174.417216] [<ffffffffa0b741fb>] ll_iget+0xab/0x350 [lustre] [15174.417773] [<ffffffffa0b67ad2>] ll_prep_inode+0x232/0xc80 [lustre] [15174.418414] [<ffffffffa08cb580>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc] [15174.419074] [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre] [15174.419852] [<ffffffffa08cf4e0>] ? lustre_swab_generic_32s+0x20/0x20 [ptlrpc] [15174.420543] [<ffffffffa0b749a9>] ll_lookup_it_finish+0x109/0x1090 [lustre] [15174.421200] [<ffffffffa060cc10>] ? lmv_intent_lock+0x12d0/0x1b50 [lmv] [15174.421830] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.422328] [<ffffffffa0b74504>] ? ll_i2gids+0x24/0xb0 [lustre] [15174.422904] [<ffffffff81114b82>] ? from_kgid+0x12/0x20 [15174.423403] [<ffffffffa0b73a20>] ? ll_invalidate_negative_children+0x1d0/0x1d0 [lustre] [15174.424179] [<ffffffffa0b761ce>] ll_lookup_it+0x89e/0xee0 [lustre] [15174.424789] [<ffffffffa0b77b9b>] ll_lookup_nd+0xbb/0x190 [lustre] [15174.425368] [<ffffffff81208add>] lookup_real+0x1d/0x50 [15174.425864] [<ffffffff81209452>] __lookup_hash+0x42/0x60 [15174.426378] [<ffffffff816841f2>] lookup_slow+0x42/0xa7 [15174.426878] [<ffffffff8120c973>] path_lookupat+0x773/0x7a0 [15174.427409] [<ffffffff811de665>] ? kmem_cache_alloc+0x35/0x1e0 [15174.427971] [<ffffffff8120f2bf>] ? getname_flags+0x4f/0x1a0 [15174.428504] [<ffffffff8120c9cb>] filename_lookup+0x2b/0xc0 [15174.429036] [<ffffffff812103e7>] user_path_at_empty+0x67/0xc0 [15174.429587] [<ffffffff812104eb>] ? do_filp_open+0x4b/0xb0 [15174.430112] [<ffffffff81210451>] user_path_at+0x11/0x20 [15174.430631] [<ffffffff812038c3>] vfs_fstatat+0x63/0xc0 [15174.431135] [<ffffffff81203e91>] SYSC_newlstat+0x31/0x60 [15174.431648] [<ffffffff81692e66>] ? trace_do_page_fault+0x56/0x150 [15174.432238] [<ffffffff8169250b>] ? do_async_page_fault+0x1b/0xd0 [15174.432815] [<ffffffff8168eff8>] ? async_page_fault+0x28/0x30 [15174.433365] [<ffffffff8120411e>] SyS_newlstat+0xe/0x10 [15174.433864] [<ffffffff816975c9>] system_call_fastpath+0x16/0x1b [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-514.21.1.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Thu May 25 17:04:51 UTC 2017 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-514.21.1.el7.x86_64 root=UUID=f513f02d-54c7-42d0-bc3f-1a4b0d47ac07 ro console=tty0 LANG=en_US.UTF-8 console=ttyS0,115200 net.ifnames=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never disable_cpu_apicid=0 elfcorehdr=867700K
          sarah Sarah Liu added a comment -

          Sure, I will update the ticket if I get sth.

          sarah Sarah Liu added a comment - Sure, I will update the ticket if I get sth.

          People

            jay Jinshan Xiong (Inactive)
            maloo Maloo
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: