Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8018

lov_init_raid0() ASSERTION( subdev != NULL ) failed: not init ost 0

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • Lustre 2.8.0 + llnl patches, both client and server
      TOSS 2 (RHEL 6.7 based)
      Kernel 2.6.32-573.22.1.1chaos.ch5.4.x86_64
    • 3
    • 9223372036854775807

    Description

      Client crashes during sys_creat() call on LBUG.

      LustreError: 12025:0:(lov_object.c:278:lov_init_raid0()) ASSERTION( subdev != NULL ) failed: not init ost 0
      LustreError: 12025:0:(lov_object.c:278:lov_init_raid0()) LBUG                                              
      Pid: 12025, comm: mdtest                                                                                   
      
      Call Trace:
       [<ffffffffa04a48d5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa04a4ed7>] lbug_with_loc+0x47/0xb0 [libcfs]         
       [<ffffffffa0dcc729>] lov_init_raid0+0xde9/0x1140 [lov]        
       [<ffffffffa0dc9394>] lov_object_init+0x124/0x300 [lov]        
       [<ffffffffa09347ec>] ? lu_object_add+0x2c/0x30 [obdclass]     
       [<ffffffffa09371a8>] lu_object_alloc+0xd8/0x320 [obdclass]    
       [<ffffffffa093851d>] lu_object_find_try+0xcd/0x260 [obdclass] 
       [<ffffffffa04b48d2>] ? cfs_hash_bd_add_locked+0x62/0x90 [libcfs]
       [<ffffffffa0938761>] lu_object_find_at+0xb1/0xe0 [obdclass]     
       [<ffffffffa04b7f7a>] ? cfs_hash_find_or_add+0x9a/0x190 [libcfs] 
       [<ffffffffa09387cf>] lu_object_find_slice+0x1f/0x80 [obdclass]  
       [<ffffffffa093e455>] cl_object_find+0x55/0xc0 [obdclass]        
       [<ffffffffa0e6929e>] cl_file_inode_init+0x22e/0x340 [lustre]    
       [<ffffffffa0e368b4>] ll_update_inode+0x474/0x1d90 [lustre]      
       [<ffffffffa0915765>] ? lprocfs_counter_add+0x165/0x1c0 [obdclass]
       [<ffffffffa0b01f35>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]         
       [<ffffffffa0b29372>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]    
       [<ffffffffa0b29372>] ? __req_capsule_get+0x162/0x6e0 [ptlrpc]    
       [<ffffffff812352a0>] ? security_inode_alloc+0x40/0x60            
       [<ffffffffa0e32cc9>] ? ll_lli_init+0x209/0x2a0 [lustre]          
       [<ffffffffa0e3823d>] ll_read_inode2+0x6d/0x410 [lustre]          
       [<ffffffffa0e4eceb>] ll_iget+0x12b/0x330 [lustre]                
       [<ffffffffa0e3a0ed>] ll_prep_inode+0x5bd/0xc40 [lustre]          
       [<ffffffffa0e536de>] ? ll_lookup_it+0x63e/0xda0 [lustre]         
       [<ffffffffa0e546e8>] ll_create_nd+0x3b8/0xf20 [lustre]           
       [<ffffffffa0e247c6>] ? ll_inode_permission+0x136/0x3e0 [lustre]  
       [<ffffffff811a2b76>] vfs_create+0xe6/0x110                       
       [<ffffffff811a626e>] do_filp_open+0xa8e/0xd20                    
       [<ffffffff810f3a85>] ? call_rcu_sched+0x15/0x20                  
       [<ffffffff812a07cd>] ? strncpy_from_user+0x5d/0xa0               
       [<ffffffff811b3572>] ? alloc_fd+0x92/0x160                       
       [<ffffffff81190007>] do_sys_open+0x67/0x130                      
       [<ffffffff8118fe6d>] ? filp_close+0x5d/0x90                      
       [<ffffffff81190110>] sys_open+0x20/0x30                          
       [<ffffffff81190135>] sys_creat+0x15/0x20                         
       [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
      

      Attachments

        Activity

          [LU-8018] lov_init_raid0() ASSERTION( subdev != NULL ) failed: not init ost 0
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21411/
          Subject: LU-8018 lov: ld_target could be NULL
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 916b5f7f4672c1070e27a2fe0bfae371b0a729d6

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21411/ Subject: LU-8018 lov: ld_target could be NULL Project: fs/lustre-release Branch: master Current Patch Set: Commit: 916b5f7f4672c1070e27a2fe0bfae371b0a729d6
          gerrit Gerrit Updater added a comment - - edited

          Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/21411
          Subject: LU-8018 lov: ld_target could be NULL
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 6
          Commit: d3dceaca9964e65f0da4d05bbb3e93ee31a8a752

          gerrit Gerrit Updater added a comment - - edited Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/21411 Subject: LU-8018 lov: ld_target could be NULL Project: fs/lustre-release Branch: master Current Patch Set: 6 Commit: d3dceaca9964e65f0da4d05bbb3e93ee31a8a752
          ofaaland Olaf Faaland added a comment -

          Zhenyu,

          Those "Failing over" messages are a result of us umount-ing the server targets and then mounting them again. The first time at 8:34 as I recall was to see if OSTs would make the missing connections. In retrospect I should have looked at the config logs. The second time I don't recall the reason for.

          -Olaf

          ofaaland Olaf Faaland added a comment - Zhenyu, Those "Failing over" messages are a result of us umount-ing the server targets and then mounting them again. The first time at 8:34 as I recall was to see if OSTs would make the missing connections. In retrospect I should have looked at the config logs. The second time I don't recall the reason for. -Olaf
          bobijam Zhenyu Xu added a comment -

          From the log, the system was not stable and have several failover happened between 8:19 and 9:01

          [~/tmp/logs/lu8012/con-logs/] $ grep Failing *
          199.log:16:2016-04-12 08:34:22 Lustre: Failing over lustre-OST0000
          199.log:26:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0000
          200.log:19:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0001
          201.log:19:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0002
          202.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0003
          203.log:19:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0004
          204.log:16:2016-04-12 08:34:22 Lustre: Failing over lustre-OST0005
          204.log:26:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0005
          205.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0006
          206.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0007
          207.log:22:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0008
          208.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0009
          

          Can't tell what caused the failover though.

          bobijam Zhenyu Xu added a comment - From the log, the system was not stable and have several failover happened between 8:19 and 9:01 [~/tmp/logs/lu8012/con-logs/] $ grep Failing * 199.log:16:2016-04-12 08:34:22 Lustre: Failing over lustre-OST0000 199.log:26:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0000 200.log:19:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0001 201.log:19:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0002 202.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0003 203.log:19:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0004 204.log:16:2016-04-12 08:34:22 Lustre: Failing over lustre-OST0005 204.log:26:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0005 205.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0006 206.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0007 207.log:22:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0008 208.log:23:2016-04-12 09:05:47 Lustre: Failing over lustre-OST0009 Can't tell what caused the failover though.

          Hi Bobijam,
          Can you also follow up with this ticket?
          Thanks.
          Joe

          jgmitter Joseph Gmitter (Inactive) added a comment - Hi Bobijam, Can you also follow up with this ticket? Thanks. Joe
          ofaaland Olaf Faaland added a comment -

          Attached the console logs as con-logs.tgz

          ofaaland Olaf Faaland added a comment - Attached the console logs as con-logs.tgz

          Olaf, any chance of posting the server console logs, so we can see why the initial MDS-OSS connections failed?

          adilger Andreas Dilger added a comment - Olaf, any chance of posting the server console logs, so we can see why the initial MDS-OSS connections failed?

          People

            bobijam Zhenyu Xu
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: