[LU-1428] MDT servrice threads spinning in cfs_hash_for_each_relax() Created: 21/May/12 Updated: 23/Jul/12 Resolved: 23/Jul/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.1 |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.1.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Ned Bass | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | libcfs, server | ||
| Environment: | |||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4587 | ||||||||
| Description |
|
We have two MDT service threads using 100% CPU on a production MDS. I can't get a backtrace from crash because they do not yield the CPU, but based on oprofile they seem to be spinning in cfs_hash_for_each_relax(). At the same time we are seeing client hangs and high lock cancellation rates on the OSTs. samples % image name app name symbol name 4225020 33.0708 libcfs.ko libcfs.ko cfs_hash_for_each_relax 3345225 26.1843 libcfs.ko libcfs.ko cfs_hash_hh_hhead 532409 4.1674 ptlrpc.ko ptlrpc.ko ldlm_cancel_locks_for_export_cb 307199 2.4046 ptlrpc.ko ptlrpc.ko lock_res_and_lock 175349 1.3725 vmlinux vmlinux native_read_tsc 151989 1.1897 ptlrpc.ko ptlrpc.ko ldlm_del_waiting_lock 136679 1.0698 libcfs.ko libcfs.ko cfs_hash_rw_lock 109269 0.8553 jbd2.ko jbd2.ko journal_clean_one_cp_list |
| Comments |
| Comment by Prakash Surya (Inactive) [ 21/May/12 ] |
|
And some more detailed information from the same oprofile data set is below. It's pretty clear that we're spending over 30% of the cpu time on this one line: 00000000000001dc 118121958 119737591 97.3140 98.6450 linux-tracefile.c:195 Expanding that line number out we get: libcfs/libcfs/linux/linux-tracefile.c:
--------------------------------------
183 void
184 cfs_set_ptldebug_header(struct ptldebug_header *header,
185 struct libcfs_debug_msg_data *m,
186 unsigned long stack)
187 {
188 struct timeval tv;
189
190 do_gettimeofday(&tv);
191
192 header->ph_subsys = m->msg_subsys;
193 header->ph_mask = m->msg_mask;
194 header->ph_cpu_id = cfs_smp_processor_id();
195 header->ph_type = cfs_trace_buf_idx_get();
196 header->ph_sec = (__u32)tv.tv_sec;
197 header->ph_usec = tv.tv_usec;
198 header->ph_stack = stack;
199 header->ph_pid = current->pid;
200 header->ph_line_num = m->msg_line;
201 header->ph_extern_pid = 0;
202 return;
203 }
The top offenders: vma samples cum. samples % cum. % linenr info image name app name symbol name 0000000000000170 121382292 121382292 38.3297 38.3297 linux-tracefile.c:187 libcfs.ko libcfs.ko cfs_set_ptldebug_header 0000000000000170 13319 13319 0.0110 0.0110 linux-tracefile.c:187 0000000000000171 3821 17140 0.0031 0.0141 linux-tracefile.c:187 0000000000000174 12019 29159 0.0099 0.0240 linux-tracefile.c:187 0000000000000178 1581 30740 0.0013 0.0253 linux-tracefile.c:187 000000000000017c 12110 42850 0.0100 0.0353 linux-tracefile.c:187 0000000000000184 5846 48696 0.0048 0.0401 linux-tracefile.c:187 0000000000000189 23130 71826 0.0191 0.0592 linux-tracefile.c:187 0000000000000190 3268 75094 0.0027 0.0619 linux-tracefile.c:187 0000000000000193 12462 87556 0.0103 0.0721 linux-tracefile.c:187 0000000000000196 1 87557 8.2e-07 0.0721 linux-tracefile.c:190 00000000000001a0 23924 111481 0.0197 0.0918 linux-tracefile.c:192 00000000000001a3 48882 160363 0.0403 0.1321 linux-tracefile.c:193 00000000000001a8 3 160366 2.5e-06 0.1321 linux-tracefile.c:193 00000000000001ab 5 160371 4.1e-06 0.1321 linux-tracefile.c:194 00000000000001b3 13156 173527 0.0108 0.1430 linux-tracefile.c:194 00000000000001b7 11 173538 9.1e-06 0.1430 thread_info.h:216 00000000000001c0 43 173581 3.5e-05 0.1430 linux-tracefile.c:142 00000000000001c7 43006 216587 0.0354 0.1784 linux-tracefile.c:142 00000000000001cc 366 216953 3.0e-04 0.1787 linux-tracefile.c:142 00000000000001d2 52496 269449 0.0432 0.2220 linux-tracefile.c:142 00000000000001d4 199906 469355 0.1647 0.3867 thread_info.h:216 00000000000001d6 836 470191 6.9e-04 0.3874 thread_info.h:216 00000000000001d9 1145442 1615633 0.9437 1.3310 thread_info.h:216 00000000000001dc 118121958 119737591 97.3140 98.6450 linux-tracefile.c:195 00000000000001e0 744 119738335 6.1e-04 98.6456 linux-tracefile.c:196 00000000000001e4 1841 119740176 0.0015 98.6472 linux-tracefile.c:198 00000000000001e8 34073 119774249 0.0281 98.6752 linux-tracefile.c:203 00000000000001ec 90 119774339 7.4e-05 98.6753 linux-tracefile.c:196 00000000000001ef 13000 119787339 0.0107 98.6860 linux-tracefile.c:197 00000000000001f3 1558552 121345891 1.2840 99.9700 linux-tracefile.c:197 00000000000001f7 14 121345905 1.2e-05 99.9700 current.h:14 0000000000000200 105 121346010 8.7e-05 99.9701 linux-tracefile.c:199 0000000000000206 10931 121356941 0.0090 99.9791 linux-tracefile.c:199 0000000000000209 7 121356948 5.8e-06 99.9791 linux-tracefile.c:200 000000000000020e 184 121357132 1.5e-04 99.9793 linux-tracefile.c:201 0000000000000215 12991 121370123 0.0107 99.9900 linux-tracefile.c:203 0000000000000219 1 121370124 8.2e-07 99.9900 linux-tracefile.c:200 000000000000021c 187 121370311 1.5e-04 99.9901 linux-tracefile.c:203 0000000000000220 11753 121382064 0.0097 99.9998 linux-tracefile.c:203 0000000000000221 228 121382292 1.9e-04 100.000 linux-tracefile.c:203 0000000000000030 48149841 169532133 15.2046 53.5344 l_lock.c:54 ptlrpc.ko ptlrpc.ko lock_res_and_lock 0000000000000030 1844 1844 0.0038 0.0038 l_lock.c:54 0000000000000031 2957 4801 0.0061 0.0100 l_lock.c:54 0000000000000034 1895 6696 0.0039 0.0139 l_lock.c:54 0000000000000035 7 6703 1.5e-05 0.0139 l_lock.c:54 0000000000000039 728007 734710 1.5120 1.5259 l_lock.c:54 000000000000003e 5 734715 1.0e-05 1.5259 l_lock.c:56 0000000000000045 895 735610 0.0019 1.5278 l_lock.c:54 0000000000000048 7838 743448 0.0163 1.5440 l_lock.c:56 0000000000000053 54 743502 1.1e-04 1.5441 lustre_dlm.h:1228 0000000000000057 653 744155 0.0014 1.5455 lustre_dlm.h:1228 000000000000005b 1514 745669 0.0031 1.5486 lustre_dlm.h:1228 0000000000000060 47396462 48142131 98.4353 99.9840 lustre_dlm.h:1228 0000000000000064 649 48142780 0.0013 99.9853 l_lock.c:61 0000000000000068 4919 48147699 0.0102 99.9956 l_lock.c:61 0000000000000069 164 48147863 3.4e-04 99.9959 l_lock.c:61 000000000000006a 1978 48149841 0.0041 100.000 l_lock.c:61 ffffffff81035630 27003722 196535855 8.5272 62.0615 hpet.c:772 vmlinux vmlinux read_hpet ffffffff81035630 18056 18056 0.0669 0.0669 hpet.c:772 ffffffff81035631 22548 40604 0.0835 0.1504 hpet.c:772 ffffffff81035634 2 40606 7.4e-06 0.1504 hpet.c:772 ffffffff81035640 15796 56402 0.0585 0.2089 io.h:22 ffffffff81035646 1412635 1469037 5.2313 5.4401 io.h:22 ffffffff81035648 25518796 26987833 94.5010 99.9412 hpet.c:774 ffffffff81035649 15889 27003722 0.0588 100.000 hpet.c:774 000000000000e8c0 15479461 212015316 4.8881 66.9496 hash.c:1574 libcfs.ko libcfs.ko cfs_hash_for_each_relax 000000000000e8c1 3361 3361 0.0217 0.0217 hash.c:1574 000000000000e8c8 1832 5193 0.0118 0.0335 hash.c:1574 000000000000e8cd 1917 7110 0.0124 0.0459 hash.c:1574 000000000000e8d6 1854 8964 0.0120 0.0579 libcfs_debug.h:199 000000000000e8dd 1 8965 6.5e-06 0.0579 hash.c:1574 000000000000e8e0 16 8981 1.0e-04 0.0580 hash.c:1574 000000000000e8e4 2028 11009 0.0131 0.0711 hash.c:1574 000000000000e8e8 12 11021 7.8e-05 0.0712 libcfs_debug.h:199 000000000000e8f1 172 11193 0.0011 0.0723 libcfs_debug.h:199 000000000000e8f7 613 11806 0.0040 0.0763 libcfs_hash.h:454 000000000000e8fb 1832 13638 0.0118 0.0881 hash.c:1585 000000000000e908 264 13902 0.0017 0.0898 hash.c:1585 000000000000e90e 1836 15738 0.0119 0.1017 libcfs_hash.h:574 000000000000e912 123 15861 7.9e-04 0.1025 libcfs_hash.h:574 000000000000e917 1864 17725 0.0120 0.1145 libcfs_hash.h:574 000000000000e919 3816 21541 0.0247 0.1392 hash.c:1589 000000000000e91d 150 21691 9.7e-04 0.1401 hash.c:1589 000000000000e923 1964 23655 0.0127 0.1528 hash.c:1591 000000000000e931 1953 25608 0.0126 0.1654 hash.c:1591 000000000000e93e 1929 27537 0.0125 0.1779 hash.c:1591 000000000000e942 13 27550 8.4e-05 0.1780 hash.c:1591 000000000000e949 5 27555 3.2e-05 0.1780 hash.c:1591 000000000000e950 2283 29838 0.0147 0.1928 libcfs_hash.h:592 000000000000e954 85590 115428 0.5529 0.7457 libcfs_hash.h:592 000000000000e956 2 115430 1.3e-05 0.7457 libcfs_hash.h:592 000000000000e959 190519 305949 1.2308 1.9765 libcfs_hash.h:637 000000000000e95d 44565 350514 0.2879 2.2644 libcfs_hash.h:637 000000000000e960 2 350516 1.3e-05 2.2644 hash.c:1597 000000000000e967 54428 404944 0.3516 2.6160 libcfs_hash.h:637 000000000000e96a 1257 406201 0.0081 2.6241 libcfs_hash.h:637 000000000000e970 1658681 2064882 10.7154 13.3395 libcfs_hash.h:683 000000000000e974 61327 2126209 0.3962 13.7357 libcfs_hash.h:683 000000000000e977 1453 2127662 0.0094 13.7451 libcfs_hash.h:683 000000000000e97a 1713200 3840862 11.0676 24.8126 libcfs_hash.h:683 000000000000e97c 2480136 6320998 16.0221 40.8347 hash.c:1597 000000000000e97f 6860 6327858 0.0443 40.8791 hash.c:1597 000000000000e985 1736065 8063923 11.2153 52.0943 hash.c:1598 000000000000e988 86 8064009 5.6e-04 52.0949 hash.c:1598 000000000000e98b 8691 8072700 0.0561 52.1510 hash.c:1598 000000000000e991 1788 8074488 0.0116 52.1626 hash.c:1598 000000000000e995 37138 8111626 0.2399 52.4025 hash.c:1598 000000000000e999 1 8111627 6.5e-06 52.4025 hash.c:1598 000000000000e9a0 101 8111728 6.5e-04 52.4032 libcfs_hash.h:560 000000000000e9a4 3117 8114845 0.0201 52.4233 libcfs_hash.h:560 000000000000e9ab 170 8115015 0.0011 52.4244 libcfs_hash.h:560 000000000000e9b1 9885 8124900 0.0639 52.4883 libcfs_hash.h:562 000000000000e9b7 19 8124919 1.2e-04 52.4884 libcfs_hash.h:562 000000000000e9ba 1643 8126562 0.0106 52.4990 libcfs_hash.h:562 000000000000e9bc 134 8126696 8.7e-04 52.4999 hash.c:1607 000000000000e9c1 114 8126810 7.4e-04 52.5006 libcfs_hash.h:574 000000000000e9c5 2729 8129539 0.0176 52.5182 libcfs_hash.h:574 000000000000e9c7 97 8129636 6.3e-04 52.5189 libcfs_hash.h:574 000000000000e9ca 222 8129858 0.0014 52.5203 hash.c:1607 000000000000e9ce 1475 8131333 0.0095 52.5298 libcfs_hash.h:574 000000000000e9d0 24743 8156076 0.1598 52.6897 libcfs_hash.h:592 000000000000e9d4 1606 8157682 0.0104 52.7000 libcfs_hash.h:592 000000000000e9d6 3 8157685 1.9e-05 52.7001 libcfs_hash.h:592 000000000000e9da 309 8157994 0.0020 52.7021 libcfs_hash.h:592 000000000000e9dd 3609 8161603 0.0233 52.7254 libcfs_hash.h:637 000000000000e9e1 2135 8163738 0.0138 52.7392 hash.c:1617 000000000000e9ea 1606 8165344 0.0104 52.7495 hash.c:1617 000000000000e9f6 1650 8166994 0.0107 52.7602 hash.c:1623 000000000000ea03 1644 8168638 0.0106 52.7708 hash.c:1598 000000000000ea0d 3468 8172106 0.0224 52.7932 libcfs_hash.h:546 000000000000ea13 1 8172107 6.5e-06 52.7932 libcfs_hash.h:546 000000000000ea16 1705 8173812 0.0110 52.8042 libcfs_hash.h:598 000000000000ea1a 82 8173894 5.3e-04 52.8048 libcfs_hash.h:598 000000000000ea20 1810 8175704 0.0117 52.8165 libcfs_hash.h:598 000000000000ea23 63 8175767 4.1e-04 52.8169 libcfs_hash.h:579 000000000000ea27 1751 8177518 0.0113 52.8282 libcfs_hash.h:579 000000000000ea2c 49 8177567 3.2e-04 52.8285 libcfs_hash.h:579 000000000000ea2f 60 8177627 3.9e-04 52.8289 hash.c:1604 000000000000ea33 1761 8179388 0.0114 52.8403 hash.c:1604 000000000000ea3c 1738 8181126 0.0112 52.8515 hash.c:1604 000000000000ea3f 177 8181303 0.0011 52.8526 hash.c:1605 000000000000ea42 28630 8209933 0.1850 53.0376 hash.c:1605 000000000000eaa5 1723469 9933402 11.1339 64.1715 hash.c:1597 000000000000eaa8 1718161 11651563 11.0996 75.2711 hash.c:1597 000000000000eaac 7189 11658752 0.0464 75.3176 hash.c:1597 000000000000eab2 2105171 13763923 13.5998 88.9173 hash.c:1597 000000000000eab4 2200 13766123 0.0142 88.9315 hash.c:1597 000000000000eab6 44 13766167 2.8e-04 88.9318 hash.c:1597 000000000000eabc 53076 13819243 0.3429 89.2747 libcfs_hash.h:598 000000000000eac0 1226196 15045439 7.9214 97.1961 libcfs_hash.h:598 000000000000eac2 53492 15098931 0.3456 97.5417 libcfs_hash.h:598 000000000000eac9 47528 15146459 0.3070 97.8487 hash.c:1591 000000000000eacd 54142 15200601 0.3498 98.1985 hash.c:1591 000000000000ead7 53856 15254457 0.3479 98.5464 hash.c:1591 000000000000eade 53717 15308174 0.3470 98.8935 hash.c:1591 000000000000eae0 1 15308175 6.5e-06 98.8935 hash.c:1591 000000000000eae4 9268 15317443 0.0599 98.9533 hash.c:1591 000000000000eae8 50483 15367926 0.3261 99.2795 hash.c:1591 000000000000eaec 1 15367927 6.5e-06 99.2795 hash.c:1591 000000000000eaef 320 15368247 0.0021 99.2815 hash.c:1591 000000000000eaf3 51637 15419884 0.3336 99.6151 hash.c:1591 000000000000eafd 49470 15469354 0.3196 99.9347 libcfs_hash.h:579 000000000000eb00 6 15469360 3.9e-05 99.9347 libcfs_hash.h:579 000000000000eb02 1854 15471214 0.0120 99.9467 libcfs_hash.h:579 000000000000eb05 189 15471403 0.0012 99.9479 hash.c:1632 000000000000eb08 1966 15473369 0.0127 99.9606 hash.c:1632 000000000000eb0d 1874 15475243 0.0121 99.9728 hash.c:1632 000000000000eb13 1918 15477161 0.0124 99.9851 hash.c:1632 000000000000eb16 1944 15479105 0.0126 99.9977 hash.c:1632 000000000000eb17 1 15479106 6.5e-06 99.9977 hash.c:1583 000000000000eb1e 169 15479275 0.0011 99.9988 hash.c:1583 000000000000eb25 2 15479277 1.3e-05 99.9988 hash.c:1583 000000000000eb31 184 15479461 0.0012 100.000 hash.c:1583 |
| Comment by Ned Bass [ 21/May/12 ] |
I think that's because you looked after I turned on full lustre debugging and I forgot to stop profiling. |
| Comment by Ned Bass [ 21/May/12 ] |
|
Uploading lustre debug log |
| Comment by Prakash Surya (Inactive) [ 21/May/12 ] |
|
Sure. Looking only at the cfs_hash_for_each_relax function, these jump out at me: 000000000000e970 1658681 2064882 10.7154 13.3395 libcfs_hash.h:683 ... 000000000000e97a 1713200 3840862 11.0676 24.8126 libcfs_hash.h:683 000000000000e97c 2480136 6320998 16.0221 40.8347 hash.c:1597 ... 000000000000e985 1736065 8063923 11.2153 52.0943 hash.c:1598 ... 000000000000eaa5 1723469 9933402 11.1339 64.1715 hash.c:1597 000000000000eaa8 1718161 11651563 11.0996 75.2711 hash.c:1597 ... 000000000000eab2 2105171 13763923 13.5998 88.9173 hash.c:1597 ... 000000000000eac0 1226196 15045439 7.9214 97.1961 libcfs_hash.h:598 Which point to these lines: 1597 cfs_hash_bd_for_each_hlist(hs, &bd, hhead) {
...
1626 }
1627 cfs_hash_bd_unlock(hs, &bd, 0);
I'm not sure what to make of it though. My first guess was the old hash function but that might be ruled out as we have the |
| Comment by Ned Bass [ 21/May/12 ] |
|
We also had a similar occurrence around the same time on a separate Lustre cluster. In that case the ll_evictor thread on the MDS was spinning in the same function. |
| Comment by Peter Jones [ 22/May/12 ] |
|
Oleg Could you please comment on this one? Thanks Peter |
| Comment by Bruno Faccini (Inactive) [ 23/May/12 ] |
|
We also encounter this same situation on one of our MDS who was not fully operational ("df" was working, but some sub-trees access was hanging). In our case, there was 4 threads eating 100% cpu but regularly re-schedule()'ing : > 9735 2 18 ffff88187c3d87d0 RU 0.0 0 0 [ldlm_elt] since we could not find a way to gracefully recover from the situation we decided to re-boot the MDT and to take/force a crash-dump where the threads stacks looks the same than in our live Alt+SysRq+l attempts : crash> bt 9735 9770 35489 46002 PID: 9770 TASK: ffff88205ada0100 CPU: 12 COMMAND: "ll_evictor" PID: 35489 TASK: ffff881854ef2790 CPU: 1 COMMAND: "mdt_121" PID: 46002 TASK: ffff88184cca97d0 CPU: 15 COMMAND: "mdt_445" thus my current opinion is that there seems to be a problem (loop ?) in the hash-list/struct that manage export's locks. But I need to dig more in the crash-dump to conclude ... |
| Comment by Ned Bass [ 23/May/12 ] |
|
Bruno, Do you have the |
| Comment by Bruno Faccini (Inactive) [ 04/Jun/12 ] |
|
Thank's for pointing that Ned, the patch for |
| Comment by Liang Zhen (Inactive) [ 05/Jun/12 ] |
|
Looks like this one is same with ldlm_cancel_locks_for_export_cb()->
ldlm_lock_cancel()->
ldlm_lock_destroy_nolock()->
ldlm_lock_destroy_internal()->cfs_hash_del()
we actually started to see this since: Probably we never really can fix this issue although we have landed a patch on BZ 19557. I suspect it's because: ldlm_lock_destroy_internal:
if (lock->l_destroyed) {
LASSERT(cfs_list_empty(&lock->l_lru));
EXIT;
return 0;
}
lock->l_destroyed = 1;
if (lock->l_export && lock->l_export->exp_lock_hash &&
!cfs_hlist_unhashed(&lock->l_exp_hash))
cfs_hash_del(lock->l_export->exp_lock_hash,
&lock->l_remote_handle, &lock->l_exp_hash);
lock->l_exp_hash should be protected by internal lock of cfs_hash, but we called cfs_hlist_unhashed(&lock->l_exp_hash) w/o holding cfs_hash lock, which means if someone wants to cancel this lock while export->exp_lock_hash is in progress of rehashing (thread context of cfs_workitem), there could be tiny window between deleting a lock from bucket[A] and re-adding it to bucket[B] of l_exp_hash, and cfs_hlist_unhashed(&lock->l_exp_hash) will return 1 in this tiny window, then we destroyed a lock but left it on l_exp_hash forever because we set lock::l_destroyed to 1 and ldlm_lock_destroy_internal() wouldn't do this again for us even it's called multiple times. making a simple change to cfs_hash_del() and removing the above checking could fix this issue, I can post a patch for this. |
| Comment by Liang Zhen (Inactive) [ 05/Jun/12 ] |
|
I've posted a patch for review: http://review.whamcloud.com/#change,3028 |
| Comment by Ned Bass [ 12/Jun/12 ] |
|
Thanks, we also need this patch for 2.1. |
| Comment by Bruno Faccini (Inactive) [ 19/Jun/12 ] |
|
Liang, I wanted to provide more details for this problem, and particulary to indicate that I feel pretty sure that the looping situation will never clear due to "orphaned" locks stiil beeing on the hash-lists, but seems that you finally found the bug/window !! So, may be its too late, but just in case, I also found that digging in the Servers/Clients logs, it seems to me that at the start of the problem, the concerned Client/export reported the following errors/msgs during MDT umount : 1339507111 2012 Jun 12 15:18:31 lascaux3332 kern err kernel LustreError: 57793:0:(lmv_obd.c:665:lmv_disconnect_mdc()) Target scratch2-MDT0000_UUID disconnect error -110 Also interesting is that this situation may have a "low" impact when concerning a single Client/export which already umounted and locks on a very specific part/directory of the filesystem !! This explain why we found multiple mdt_<id> threads spinning in this situation since days and no problem reported nor found whan accessing/using the concerned filesystem !!! |
| Comment by Andreas Dilger [ 19/Jun/12 ] |
|
Patch landed to master for Lustre 2.3.0, still needs to be landed to b2_1 for 2.1.3. |
| Comment by Jay Lan (Inactive) [ 29/Jun/12 ] |
|
NASA Ames hit this this morning on a 2.1.1 MDS. |
| Comment by Peter Jones [ 23/Jul/12 ] |
|
Landed for 2.1.3 and 2.3 |