Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9485

conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • None
    • FSTYPE=zfs
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for John Hammond <john.hammond@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/2d9acc98-3546-11e7-b0a8-5254006e85c2.

      The sub-test test_46a failed with the following error:

      test failed to respond and timed out
      

      From the client:

      01:20:15:[ 8727.094028] Lustre: Mounted lustre-client
      01:20:15:[ 8727.326631] LustreError: 14973:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
      01:20:15:[ 8727.329204] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
      01:20:15:[ 8727.331709] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
      01:20:15:[ 8727.334195] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
      01:20:15:[ 8727.336551] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
      01:20:15:[ 8727.338877] LustreError: 14973:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
      01:20:15:[ 8727.343166] Lustre: 14973:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
      01:20:15:[ 8727.345641] Lustre: 14973:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
      01:20:15:[ 8727.348135] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
      01:20:15:[ 8727.350427] Lustre: 14973:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
      01:20:15:[ 8727.352806] LustreError: 14973:0:(llite_lib.c:2256:ll_prep_inode()) new_inode -fatal: rc -22
      01:20:15:[ 8727.362035] BUG: unable to handle kernel paging request at ffffffa0c9b0b0ff
      01:20:15:[ 8727.363006] IP: [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:15:[ 8727.363006] PGD 19bd067 PUD 0 
      01:20:15:[ 8727.363006] Oops: 0010 [#1] SMP 
      01:20:15:[ 8727.363006] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel parport_pc parport lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon i2c_piix4 pcspkr nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk cirrus crct10dif_pclmul crct10dif_common drm_kms_helper 8139too ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel serio_raw ttm virtio_pci virtio_ring virtio drm 8139cp i2c_core mii libata floppy [last unloaded: libcfs]
      01:20:15:[ 8727.363006] CPU: 0 PID: 14979 Comm: bash Tainted: G        W  OE  ------------   3.10.0-514.16.1.el7.x86_64 #1
      01:20:15:[ 8727.363006] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      01:20:15:[ 8727.363006] task: ffff88007ae76dd0 ti: ffff88007a240000 task.ti: ffff88007a240000
      01:20:15:[ 8727.363006] RIP: 0010:[<ffffffa0c9b0b0ff>]  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:15:[ 8727.363006] RSP: 0018:ffff88007fc03e78  EFLAGS: 00010286
      01:20:15:[ 8727.363006] RAX: ffffffa0c9b0b0ff RBX: ffffffff81a1a5c0 RCX: 0000000000019b00
      01:20:15:[ 8727.363006] RDX: ffff880067b8caa7 RSI: 0000000000000003 RDI: ffff880067b8caa7
      01:20:15:[ 8727.363006] RBP: ffff88007fc03ed0 R08: ffff88007b7b7e00 R09: 000000018010000f
      01:20:15:[ 8727.363006] R10: 0000000000000001 R11: ffffea0001ededc0 R12: 000000000000000a
      01:20:15:[ 8727.363006] R13: 0000000000000001 R14: ff8800658b5fb0ff R15: ffff88007fc101c0
      01:20:15:[ 8727.363006] FS:  00007f4f234d3740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      01:20:15:[ 8727.363006] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      01:20:15:[ 8727.363006] CR2: ffffffa0c9b0b0ff CR3: 00000000782bb000 CR4: 00000000000406f0
      01:20:15:[ 8727.363006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      01:20:15:[ 8727.363006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      01:20:15:[ 8727.363006] Stack:
      01:20:15:[ 8727.363006]  ffffffff81139bbd ffff880066441800 0000000000000001 ffff88007fc101e8
      01:20:15:[ 8727.363006]  ffff88007ae76dd0 ffff88007a243fd8 0000000000000009 0000000000000009
      01:20:15:[ 8727.363006]  ffff880067d22f58 ffffffff819b3108 0000000000000001 ffff88007fc03f40
      01:20:15:[ 8727.363006] Call Trace:
      01:20:15:[ 8727.363006]  <IRQ> 
      01:20:15:[ 8727.363006]  [<ffffffff81139bbd>] ? rcu_process_callbacks+0x1dd/0x550
      01:20:15:[ 8727.363006]  [<ffffffff8108f63f>] __do_softirq+0xef/0x280
      01:20:15:[ 8727.363006]  [<ffffffff81698c1c>] call_softirq+0x1c/0x30
      01:20:15:[ 8727.363006]  [<ffffffff8102d365>] do_softirq+0x65/0xa0
      01:20:15:[ 8727.363006]  [<ffffffff8108f9d5>] irq_exit+0x115/0x120
      01:20:15:[ 8727.363006]  [<ffffffff81699895>] smp_apic_timer_interrupt+0x45/0x60
      01:20:15:[ 8727.363006]  [<ffffffff81697ddd>] apic_timer_interrupt+0x6d/0x80
      01:20:15:[ 8727.363006]  <EOI> 
      01:20:15:[ 8727.363006]  [<ffffffff8121ccf3>] ? dup_fd+0x153/0x2d0
      01:20:15:[ 8727.363006]  [<ffffffff8121ccbf>] ? dup_fd+0x11f/0x2d0
      01:20:15:[ 8727.363006]  [<ffffffff81083f70>] copy_process+0xc10/0x1960
      01:20:15:[ 8727.363006]  [<ffffffff81084e71>] do_fork+0x91/0x2c0
      01:20:16:[ 8727.363006]  [<ffffffff81085126>] SyS_clone+0x16/0x20
      01:20:16:[ 8727.363006]  [<ffffffff816974d9>] stub_clone+0x69/0x90
      01:20:16:[ 8727.363006]  [<ffffffff81697189>] ? system_call_fastpath+0x16/0x1b
      01:20:16:[ 8727.363006] Code:  Bad RIP value.
      01:20:16:[ 8727.363006] RIP  [<ffffffa0c9b0b0ff>] 0xffffffa0c9b0b0fe
      01:20:16:[ 8727.363006]  RSP <ffff88007fc03e78>
      01:20:16:[ 8727.363006] CR2: ffffffa0c9b0b0ff
      

      https://testing.hpdd.intel.com/test_logs/2f0f1c78-3546-11e7-b0a8-5254006e85c2/show_text

      Info required for matching: conf-sanity 46a

      Attachments

        Activity

          [LU-9485] conf-sanity test_46a: cl_file_inode_init()) Failure to initialize cl object
          pjones Peter Jones added a comment -

          Landed for 2.10

          pjones Peter Jones added a comment - Landed for 2.10

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27658/
          Subject: LU-9485 llite: check the return value of cl_file_inode_init()
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: ca41d88d9228e469a8728c20c21c7e919b77dc12

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27658/ Subject: LU-9485 llite: check the return value of cl_file_inode_init() Project: fs/lustre-release Branch: master Current Patch Set: Commit: ca41d88d9228e469a8728c20c21c7e919b77dc12
          bobijam Zhenyu Xu added a comment -

          yes, patchset 2 should be the fix patch, I'll update it and request reviews for future landing.

          bobijam Zhenyu Xu added a comment - yes, patchset 2 should be the fix patch, I'll update it and request reviews for future landing.
          pjones Peter Jones added a comment -

          Bobijam

          So patchset 2 was more than just a diagnostic patch - it was a possible (and now proven) fix? Does that mean that we should rollback to v2 of the patch and get reviews with a view to landing this patch?

          Peter

          pjones Peter Jones added a comment - Bobijam So patchset 2 was more than just a diagnostic patch - it was a possible (and now proven) fix? Does that mean that we should rollback to v2 of the patch and get reviews with a view to landing this patch? Peter
          bobijam Zhenyu Xu added a comment -

          that's a good news, meaning that the patch fixes the memory issue.

          bobijam Zhenyu Xu added a comment - that's a good news, meaning that the patch fixes the memory issue.
          sarah Sarah Liu added a comment - - edited

          I have run patchset #2 locally with the line added several times and cannot reproduce it this time, but the test failed as
          https://testing.hpdd.intel.com/test_sets/3cec9368-52e0-11e7-8a1b-5254006e85c2

          [19954.374299] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:51:35 (1497649895)
          [19956.043116] Lustre: Lustre: Build Version: 2.9.59_16_gbe2e8e5
          [19956.068256] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180]
          [19956.068913] LNet: Accept secure, port 988
          [19957.576952] Lustre: 2814:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
          [19957.578575] Lustre: 2814:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism
          [19957.579450] Key type lgssc registered
          [19957.631952] Lustre: Echo OBD driver; http://www.lustre.org/
          [19994.598809] Lustre: Mounted lustre-client
          [19999.823173] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid 40
          [20000.027082] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid in FULL state after 0 sec
          [20000.102424] Lustre: Mounted lustre-client
          [20000.202406] LustreError: 5182:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1
          [20000.203449] Lustre: 5182:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1
          [20000.204419] Lustre: 5182:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0
          [20000.205502] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34
          [20000.206337] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2
          [20000.207177] LustreError: 5182:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22
          [20000.208389] LustreError: 5182:0:(llite_lib.c:2273:ll_prep_inode()) new_inode -fatal: rc -22
          [20000.368646] Lustre: DEBUG MARKER: conf-sanity test_46a: @@@@@@ FAIL: stat /mnt/lustre/widestripe failed
          [20002.819582] LustreError: 5512:0:(obd_config.c:1429:class_process_proc_param()) lov.: error writing proc entry 'stripecount': rc = -34
          [20008.013856] Lustre: Unmounted lustre-client
          conf-sanity returned 0
          Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre (opts:)
          Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre2 (opts:)
          Stopping /mnt/lustre-mds1 (opts:-f) on trevis-73vm11
          [20031.824103] Key type lgssc unregistered
          [20031.824613] Lustre: 6228:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism
          [20037.824058] LNet: Removed LNI 10.9.6.136@tcp
          modules unloaded.
          Finished at Fri Jun 16 21:53:00 UTC 2017 in 176s
          auster: completed with rc 0
          
          
          sarah Sarah Liu added a comment - - edited I have run patchset #2 locally with the line added several times and cannot reproduce it this time, but the test failed as https://testing.hpdd.intel.com/test_sets/3cec9368-52e0-11e7-8a1b-5254006e85c2 [19954.374299] Lustre: DEBUG MARKER: == conf-sanity test 46a: handle ost additional - wide striped file =================================== 21:51:35 (1497649895) [19956.043116] Lustre: Lustre: Build Version: 2.9.59_16_gbe2e8e5 [19956.068256] LNet: Added LNI 10.9.6.136@tcp [8/256/0/180] [19956.068913] LNet: Accept secure, port 988 [19957.576952] Lustre: 2814:0:(gss_svc_upcall.c:1178:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active [19957.578575] Lustre: 2814:0:(gss_mech_switch.c:72:lgss_mech_register()) Register gssnull mechanism [19957.579450] Key type lgssc registered [19957.631952] Lustre: Echo OBD driver; http://www.lustre.org/ [19994.598809] Lustre: Mounted lustre-client [19999.823173] Lustre: DEBUG MARKER: trevis-58vm1.trevis.hpdd.intel.com: executing wait_import_state FULL osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid 40 [20000.027082] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-ffff8800ab927800.ost_server_uuid in FULL state after 0 sec [20000.102424] Lustre: Mounted lustre-client [20000.202406] LustreError: 5182:0:(lov_ea.c:227:lsme_unpack()) lustre-clilov_UUID: OST index 1 more than OST count 1 [20000.203449] Lustre: 5182:0:(lov_pack.c:61:lov_dump_lmm_common()) objid 0x1:3024, magic 0x0bd10bd0, pattern 0x1 [20000.204419] Lustre: 5182:0:(lov_pack.c:65:lov_dump_lmm_common()) stripe_size 1048576, stripe_count 2, layout_gen 0 [20000.205502] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 0 idx 0 subobj 0x0:34 [20000.206337] Lustre: 5182:0:(lov_pack.c:84:lov_dump_lmm_objects()) stripe 1 idx 1 subobj 0x0:2 [20000.207177] LustreError: 5182:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000bd0:0x1:0x0]: -22 [20000.208389] LustreError: 5182:0:(llite_lib.c:2273:ll_prep_inode()) new_inode -fatal: rc -22 [20000.368646] Lustre: DEBUG MARKER: conf-sanity test_46a: @@@@@@ FAIL: stat /mnt/lustre/widestripe failed [20002.819582] LustreError: 5512:0:(obd_config.c:1429:class_process_proc_param()) lov.: error writing proc entry 'stripecount': rc = -34 [20008.013856] Lustre: Unmounted lustre-client conf-sanity returned 0 Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre (opts:) Stopping clients: trevis-58vm1.trevis.hpdd.intel.com /mnt/lustre2 (opts:) Stopping /mnt/lustre-mds1 (opts:-f) on trevis-73vm11 [20031.824103] Key type lgssc unregistered [20031.824613] Lustre: 6228:0:(gss_mech_switch.c:81:lgss_mech_unregister()) Unregister krb5 mechanism [20037.824058] LNet: Removed LNI 10.9.6.136@tcp modules unloaded. Finished at Fri Jun 16 21:53:00 UTC 2017 in 176s auster: completed with rc 0
          sarah Sarah Liu added a comment -

          I update the patch base on yours with the problematic line reinstated and adding test-parameter.

          sarah Sarah Liu added a comment - I update the patch base on yours with the problematic line reinstated and adding test-parameter.

          People

            jay Jinshan Xiong (Inactive)
            maloo Maloo
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: