Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17069

Crash when writing to a deactivated OSC

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.12.9
    • None
    • 3
    • 9223372036854775807

    Description

      This does not impact Lustre version above 2.15.

      Call Trace:

      [12029.369503] BUG: unable to handle kernel NULL pointer dereference at           (null)  
      [12029.369605] IP: [<ffffffffc1041a67>] osc_lru_reserve+0x27/0x170 [osc]                  
      [12029.369702] PGD 800000006895f067 PUD 4a719067 PMD 0                                    
      [12029.369767] Oops: 0000 [#1] SMP                                                        
      ...
      [12029.389698] RIP: 0010:[<ffffffffc1041a67>]  [<ffffffffc1041a67>] osc_lru_reserve+0x27/0x170 [osc]         
      [12029.391039] RSP: 0018:ffff9756a8b83a70  EFLAGS: 00010206                                                  
      [12029.392217] RAX: ffff97567fb4b000 RBX: ffff9756ba9cab88 RCX: 0000000000000000                             
      [12029.393346] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9756818946c0                             
      [12029.394480] RBP: ffff9756a8b83a88 R08: 0000000000000001 R09: 0000000000000000                             
      [12029.395523] R10: ffff9756a4b6cf28 R11: 000000000000000f R12: ffff9756818946c0                             
      [12029.396445] R13: 0000000000020000 R14: ffff9756ba9cab88 R15: ffff975682b0ba00                             
      [12029.397361] FS:  00007efdb61e4740(0000) GS:ffff9756bfd00000(0000) knlGS:0000000000000000                  
      [12029.398336] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                             
      [12029.399246] CR2: 0000000000000000 CR3: 000000004cf86000 CR4: 00000000000606e0                             
      [12029.400145] Call Trace:                                                                                   
      [12029.400914]  [<ffffffffc1047ac6>] osc_io_write_iter_init+0x86/0x1e0 [osc]                                 
      [12029.401708]  [<ffffffffc0afb1bf>] cl_io_iter_init+0x5f/0x120 [obdclass]                                   
      [12029.402488]  [<ffffffffc10d4ff3>] lov_io_add_sub.isra.34+0xc3/0x380 [lov]                                 
      [12029.403245]  [<ffffffffc10d9967>] lov_io_iter_init+0x257/0x720 [lov]                                      
      [12029.403974]  [<ffffffffc10da38a>] lov_io_rw_iter_init+0x38a/0x520 [lov]                                   
      [12029.404717]  [<ffffffffc0afb1bf>] cl_io_iter_init+0x5f/0x120 [obdclass]                                   
      [12029.405401]  [<ffffffffc0afd4d2>] cl_io_loop+0x42/0x1c0 [obdclass]                                        
      [12029.406016]  [<ffffffffc169a3bb>] ll_file_io_generic+0x63b/0xc90 [lustre]                                 
      [12029.406637]  [<ffffffffc169aea9>] ll_file_aio_write+0x289/0x660 [lustre]                                  
      [12029.407251]  [<ffffffffc169b380>] ll_file_write+0x100/0x1c0 [lustre]                                      
      [12029.407850]  [<ffffffff9d44e590>] vfs_write+0xc0/0x1f0                                                    
      [12029.408448]  [<ffffffff9d9aaed5>] ? system_call_after_swapgs+0xa2/0x13a                                   
      [12029.409027]  [<ffffffff9d44f36f>] SyS_write+0x7f/0xf0                                                     
      [12029.409603]  [<ffffffff9d9aaed5>] ? system_call_after_swapgs+0xa2/0x13a                                   
      [12029.410153]  [<ffffffff9d9aaf92>] system_call_fastpath+0x25/0x2a                                          
      [12029.410649]  [<ffffffff9d9aaed5>] ? system_call_after_swapgs+0xa2/0x13a                                   
      

      Code:

      unsigned long osc_lru_reserve(struct client_obd *cli, unsigned long npages)             
      {                                                                                       
              unsigned long reserved = 0;                                                     
              unsigned long max_pages;                                                        
              unsigned long c;                                                                
                                                                                              
              /* reserve a full RPC window at most to avoid that a thread accidentally        
               * consumes too many LRU slots */                                               
              max_pages = cli->cl_max_pages_per_rpc * cli->cl_max_rpcs_in_flight;             
              if (npages > max_pages)                                                         
                      npages = max_pages;                                                     
                                                                                              
              c = atomic_long_read(cli->cl_lru_left);                           <-------- Crash              
      ....
      

      Reproducer:

      1. client> lfs setstripe -c1 -i0 test
      2. mgs> lctl conf_param lustrefs-OST0000.osc.active=0
      3. client> umount /mnt/lustre && mount /mnt/lustre
      4. client> dd if=/dev/zero of=yep bs=1M count=1 conv=notrunc
        ==> Crash

      The issue is present uniquely after remounting the client this way the client will not try to connect to the OST and will not init the LRU cache:

      lctl dl
      ...
      4 UP osc lustrefs-OST0000-osc-ffff8cb84915e000
      ...
      
      crash> p obd_devs[4]->u.cli.cl_lru_left      
      $3 = (atomic_long_t *) 0x0
      

      Most of the jobs do not cause a crash because most of the time they do some operation before a write (e.g: stats, seek etc...). Those operations are not impacted by this bug and will return an EIO error (in osc_io_iter_init()).

      This is the same crash that LU-11658 (but the patches did not fix fully the issue).

      Attachments

        Issue Links

          Activity

            People

              eaujames Etienne Aujames
              eaujames Etienne Aujames
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: