Project

General

Profile

Actions

Bug #6274

open

yate is eating 260% CPU while doing nothing

Added by laforge 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Low
Assignee:
Category:
tdm-core
Target version:
-
Start date:
11/25/2023
Due date:
% Done:

0%


Description

Something seems wrong if an idle yate doing nothing consumes about 260% CPU.

Looking at the yate threads, we can see that each Zap Worker consumes 10..12% of CPU while doing nothing:

This is quite weird, especially considering that osmo-e1d with 23 trunkdevs in parallel only uses 20% CPU - while actually processing packets/frames for all of those trunks.

a strace of such a single Zap Worker thread looks like this:

select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0
select(11, [10], NULL, [10], {tv_sec=0, tv_usec=100}) = 0 (Timeout)
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=0}, NULL) = 0

So basically we're using select to check if data on a single FD has arrived, time-out after 100 microseconds and then call a zero-nanoseconds sleep and start all-over again.

Looking at the source code, this looks like extremely poor programming style:

// Process incoming data
bool ZapInterface::process()
{           
    if (!m_device.select(100))
        return false;
    if (!m_device.canRead()) {
        if (m_device.event())
            checkEvents();
        return false;
    }

    int r = m_device.recv(m_buffer,m_bufsize + ZAP_CRC_LEN);
    if (r == -1) {
        if (m_device.event())
            checkEvents();
        return false;
    }
    if (r < ZAP_CRC_LEN + 1) {
        Debug(this,DebugMild,"Short read %u bytes (with CRC) [%p]",r,this);
        return false;
    }

    s_ifaceNotifyMutex.lock();
    m_notify = 0;
    s_ifaceNotifyMutex.unlock();
    DataBlock packet(m_buffer,r - ZAP_CRC_LEN,false);
#ifdef XDEBUG    
    String hex;  
    hex.hexify(packet.data(),packet.length(),' ');
    Debug(this,DebugAll,"Received data: %s [%p]",hex.safe(),this);
#endif
    receivedPacket(packet);
    packet.clear(false);
    return true; 
}   

void ZapWorkerThread::run()
{   
    if (!m_client)
        return;
    DDebug(&plugin,DebugAll,"%s is running for client (%p): %s",
        s_threadName,m_client,m_address.c_str());
    while (true) {
        if (m_client->process())
            Thread::check(true);
        else
            Thread::yield(true);
    }
}   

So basically the thread is continuously calling Thread:yield rather than waiting on some actual I/O to happen.


Files

yate-cpu.png View yate-cpu.png 45.9 KB laforge, 11/25/2023 02:23 PM

No data to display

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)