0

I'm experiencing very strange behaviour with some hardware we're working on. My setup consists of Linux 4.9.87 w/ SMP & PREEMPT RT on a dual-core iMX6 CPU.

The application running on this setup consists of 3 threads running with the SCHED_FIFO scheduling policy. The priorities 10, 20, and 30 are bound to CPU0. A fourth thread is running with the same scheduling policy and a priority of 90 on CPU1.

The behaviour I'm experiencing is as follows:

While monitoring the process, top is displaying that CPU0 is idling 97.4% of the time: the busiest thread running on this core requires 8.4% - a discrepancy of 5,8%.

This is the top output showing the usage per-CPU and per-thread in Solaris mode:

top - 13:36:41 up  1:21,  2 users,  load average: 0.52, 0.39, 0.26
Threads: 128 total,   2 running, 126 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.3 us,  1.3 sy,  0.0 ni, 97.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :   509348 total,   438164 free,    46888 used,    24296 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   450196 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND      
  798 root     -91   0   37740  35656   3676 S  9.3  7.0   0:04.47 myprog       
  797 root     -31   0   37740  35656   3676 S  8.4  7.0   0:04.43 myprog       
  241 root     -51   0       0      0      0 R  4.7  0.0   3:55.11 irq/26-2014+ 
  242 root      rt   0       0      0      0 S  3.1  0.0   2:33.62 spi3         
  796 root     -21   0   37740  35656   3676 S  1.9  7.0   0:01.14 myprog       
  794 root      20   0    2848   1892   1536 R  1.0  0.4   0:01.04 top          
  795 root     -11   0   37740  35656   3676 S  0.3  7.0   0:00.37 myprog       
    3 root      20   0       0      0      0 S  0.2  0.0   0:09.14 ksoftirqd/0  
   10 root      20   0       0      0      0 S  0.2  0.0   0:00.14 rcuc/0       
  545 root     -51   0       0      0      0 S  0.2  0.0   0:03.74 irq/36-can0  
  595 root      20   0    1700    880    824 S  0.2  0.2   0:00.99 rngd         
    1 root      20   0    1712   1168   1104 S  0.0  0.2   0:01.34 init         
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.01 kthreadd     
    4 root      -2   0       0      0      0 S  0.0  0.0   0:07.99 ktimersoftd+ 
    5 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0  
    6 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H 

The per-thread usage appears to be correct, however the totals don't match up with the per-CPU usage.

This is not an issue with top itself, as htop exhibits the same behaviour. Reading and calculating with the values from /proc/stat also yields the same results as displayed with top and htop.

The highest-priority thread (PID 798) is scheduled every 500µsecs using clock_nanosleep(). Attempting to schedule the thread every 10 millis dramatically lowered the overall CPU usage, however the numbers still did not match up and exhibited the same behaviour - this indicates that the issue is not caused by a cycle time as low as is currently configured.

Running my application on a stock distribution kernel w/o real-time patch yields the similar results - also indicating that the issue is not with the real-time patch.

Attempting to run the application on a single-core CPU yet again gives similar results, indicating that the issue is not caused by running a multi-core setup.

  • Has anybody experienced something similar before?
  • Does someone have an idea, why the sums of the per-thread CPU usages do not match the totals?
  • How can I get reliable values for the total CPU usage?

Thanks in advance!

EDIT

After further discussing the issue internally, we suspect it might be an issue of undersampling.

If we assume the kernel takes samples of the state of a CPU (user space, kernel space, or idle) in regular intervals, the results are more accurate if more samples are taken and my threads are scheduled less often.

Further assuming the samples are taken in timer ticks, I increased the kernel's CONFIG_HZ from 100Hz to 1000Hz, and modified my application so my threads are scheduled every 100ms. To achieve a load average of ca 50%, some CPU time is wasted by entering a loop that increases a counter variable. Afterwards I continually decreased the scheduling interval of my threads, while adapting my delay loop to hold the load average at roughly 50%.

My tests resulted in the following conclusions: when the threads are scheduled at 50ms and 100ms, the values reported by top seem to be correct.

Setting the scheduling interval to 10ms, discrepancies of 2-4% start showing. Further reducing the interval to 5ms increases the discrepancy up to 12%. Finally, reducing the interval even further to 2ms completely messes with the top output, causing top to show an idle time of 100%, while the application is utilising 33% of the CPU time.

This seems to confirm our speculation, that the issue is caused by undersampling.

Can anyone confirm this? Does anybody know how the CPU usage is measured by the kernel?

1 Answer 1

0

The numbers shown are estimates, most gotten by sampling. Good for a general overview, useless for down-to-decimals micromanagement. Measuring "CPU usage" is mostly useless, you can get that near to 100% by overloading the sytem (and get next to no work done). Define overall objectives in terms of your workload's performance, measure and tweak if needed.

Note that by increasing CONFIG_HZ you increase the kernel overhead (but presumably get better interactive response). The default values are decent compromises for "typical" workloads. Change them if you have hard evidence that another value is better for your use case.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .