What may be the cause of the elapsed-time spikes in a pthread critical section when the scheduling policy is SCHED_RR?Whole one core dedicated to single processIn Java critical sections, what should I synchronize on?What is the difference between mutex and critical section?Reading Critical Section Data using pthreadspthreads : pthread_cond_signal() from within critical sectionC++ Exception Handling In Critical Section (pthreads)What are Critical sections in threadsWhy setting of the scheduled policy and parameter is failing in my program - pthread Linux?pthread and accessing the critical sectionPthread Scheduling policy and prioritySCHED_RR policy to control scheduling (POSIX or Linux)

How should I behave to assure my friends that I am not after their money?

Are Finite Automata Turing Complete?

How come I was asked by a CBP officer why I was in the US?

Cascading Repair Costs following Blown Head Gasket on a 2004 Subaru Outback

Does the Distant Spell metamagic apply to the Sword Burst cantrip?

How could mana leakage be dangerous to a elf?

Calculating the partial sum of a expl3 sequence

A player is constantly pestering me about rules, what do I do as a DM?

What do you call the action of someone tackling a stronger person?

Is there a short way to compare many values mutually at same time without using multiple 'AND'-s?

Impossible darts scores

Transition from ascending to descending a rope

Layout of complex table

Declining an offer to present a poster instead of a paper

Do equal angles necessarily mean a polygon is regular?

Analog is Obtuse!

How can I repair scratches on a painted French door?

Firefox Arm64 available but RapsPi 3B+ still 32 bit

Counting occurrence of words in table is slow

Is adding a new player (or players) a DM decision, or a group decision?

"Strength" of free falling objects' ground-impact

Can a US President have someone sent to prison?

How dangerous are set-size assumptions?

Should I hide continue button until tasks are completed?



What may be the cause of the elapsed-time spikes in a pthread critical section when the scheduling policy is SCHED_RR?


Whole one core dedicated to single processIn Java critical sections, what should I synchronize on?What is the difference between mutex and critical section?Reading Critical Section Data using pthreadspthreads : pthread_cond_signal() from within critical sectionC++ Exception Handling In Critical Section (pthreads)What are Critical sections in threadsWhy setting of the scheduled policy and parameter is failing in my program - pthread Linux?pthread and accessing the critical sectionPthread Scheduling policy and prioritySCHED_RR policy to control scheduling (POSIX or Linux)






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)



I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).



I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.



Code example with __rdtsc:



pthread_spin_lock(&lock);

start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();

pthread_spin_unlock(&lock);


Code example with chrono timer:



pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);


Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.



Now, here the things go interesting (at least for me):



Test 1: Threads have the scheduling policy as SCHED_RR:



Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000



Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000




Test 2: Threads have the scheduling policy as SCHED_FIFO:



Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000



Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000




Test 3: Only single thread, the scheduling policy is SCHED_RR:



Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000




Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.



Note1: All tests give approx. similar results everytime I run them.



Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.



So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:



  1. How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...

  2. Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?

My all test code is:



#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>

/********* TEST CONFIG ************/

#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR

#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/

/**********************************/
struct param_list_s

unsigned int thread_no;
;
/**********************************/

/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/

/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);

/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)

cpu_set_t mask;
int i;

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;

_max_time[i] = 0;
_tot_time[i] = 0;

cnt = 0;

printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");

configureMemoryBehavior();

pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

create_rt_pthread(i);


printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)

pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);

return 0;



/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)


unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;

thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)

#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);

if (stop_time > start_time)

latency = stop_time - start_time;
++run_cnt[thread_no];

tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;

#endif

#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);

auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;

if (__stop_time > __start_time)

_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])

_max_time[thread_no] = __latency;


#endif

if (++i >= TEST_PTHREAD_RUN_CNT)
break;


return 0;



/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)


struct sched_param param;
pthread_attr_t attr;

printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);

/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);

/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, &param);

/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);

pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);

/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);

param_list[thread_no].thread_no = thread_no;

if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)&param_list[thread_no]) != 0)

printf("Thread could not be created.n");
exit(-1);




/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");

/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);

/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");










share|improve this question
























  • May be OT and stupid question: How do you ensure that ++cnt; takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?

    – Scheff
    Mar 25 at 11:10











  • Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…

    – Erik Alapää
    Mar 26 at 12:28

















1















I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)



I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).



I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.



Code example with __rdtsc:



pthread_spin_lock(&lock);

start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();

pthread_spin_unlock(&lock);


Code example with chrono timer:



pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);


Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.



Now, here the things go interesting (at least for me):



Test 1: Threads have the scheduling policy as SCHED_RR:



Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000



Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000




Test 2: Threads have the scheduling policy as SCHED_FIFO:



Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000



Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000




Test 3: Only single thread, the scheduling policy is SCHED_RR:



Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000




Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.



Note1: All tests give approx. similar results everytime I run them.



Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.



So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:



  1. How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...

  2. Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?

My all test code is:



#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>

/********* TEST CONFIG ************/

#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR

#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/

/**********************************/
struct param_list_s

unsigned int thread_no;
;
/**********************************/

/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/

/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);

/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)

cpu_set_t mask;
int i;

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;

_max_time[i] = 0;
_tot_time[i] = 0;

cnt = 0;

printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");

configureMemoryBehavior();

pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

create_rt_pthread(i);


printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)

pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);

return 0;



/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)


unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;

thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)

#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);

if (stop_time > start_time)

latency = stop_time - start_time;
++run_cnt[thread_no];

tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;

#endif

#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);

auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;

if (__stop_time > __start_time)

_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])

_max_time[thread_no] = __latency;


#endif

if (++i >= TEST_PTHREAD_RUN_CNT)
break;


return 0;



/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)


struct sched_param param;
pthread_attr_t attr;

printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);

/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);

/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, &param);

/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);

pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);

/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);

param_list[thread_no].thread_no = thread_no;

if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)&param_list[thread_no]) != 0)

printf("Thread could not be created.n");
exit(-1);




/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");

/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);

/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");










share|improve this question
























  • May be OT and stupid question: How do you ensure that ++cnt; takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?

    – Scheff
    Mar 25 at 11:10











  • Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…

    – Erik Alapää
    Mar 26 at 12:28













1












1








1








I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)



I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).



I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.



Code example with __rdtsc:



pthread_spin_lock(&lock);

start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();

pthread_spin_unlock(&lock);


Code example with chrono timer:



pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);


Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.



Now, here the things go interesting (at least for me):



Test 1: Threads have the scheduling policy as SCHED_RR:



Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000



Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000




Test 2: Threads have the scheduling policy as SCHED_FIFO:



Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000



Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000




Test 3: Only single thread, the scheduling policy is SCHED_RR:



Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000




Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.



Note1: All tests give approx. similar results everytime I run them.



Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.



So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:



  1. How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...

  2. Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?

My all test code is:



#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>

/********* TEST CONFIG ************/

#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR

#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/

/**********************************/
struct param_list_s

unsigned int thread_no;
;
/**********************************/

/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/

/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);

/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)

cpu_set_t mask;
int i;

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;

_max_time[i] = 0;
_tot_time[i] = 0;

cnt = 0;

printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");

configureMemoryBehavior();

pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

create_rt_pthread(i);


printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)

pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);

return 0;



/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)


unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;

thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)

#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);

if (stop_time > start_time)

latency = stop_time - start_time;
++run_cnt[thread_no];

tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;

#endif

#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);

auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;

if (__stop_time > __start_time)

_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])

_max_time[thread_no] = __latency;


#endif

if (++i >= TEST_PTHREAD_RUN_CNT)
break;


return 0;



/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)


struct sched_param param;
pthread_attr_t attr;

printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);

/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);

/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, &param);

/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);

pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);

/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);

param_list[thread_no].thread_no = thread_no;

if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)&param_list[thread_no]) != 0)

printf("Thread could not be created.n");
exit(-1);




/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");

/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);

/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");










share|improve this question
















I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)



I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).



I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.



Code example with __rdtsc:



pthread_spin_lock(&lock);

start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();

pthread_spin_unlock(&lock);


Code example with chrono timer:



pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);


Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.



Now, here the things go interesting (at least for me):



Test 1: Threads have the scheduling policy as SCHED_RR:



Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000



Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000




Test 2: Threads have the scheduling policy as SCHED_FIFO:



Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000



Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000




Test 3: Only single thread, the scheduling policy is SCHED_RR:



Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000




Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.



Note1: All tests give approx. similar results everytime I run them.



Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.



So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:



  1. How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...

  2. Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?

My all test code is:



#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>

/********* TEST CONFIG ************/

#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR

#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/

/**********************************/
struct param_list_s

unsigned int thread_no;
;
/**********************************/

/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/

/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);

/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)

cpu_set_t mask;
int i;

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;

_max_time[i] = 0;
_tot_time[i] = 0;

cnt = 0;

printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");

configureMemoryBehavior();

pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);

for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)

create_rt_pthread(i);


printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)

pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif

printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);

return 0;



/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)


unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;

thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)

#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);

if (stop_time > start_time)

latency = stop_time - start_time;
++run_cnt[thread_no];

tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;

#endif

#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);

auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();

pthread_spin_unlock(&lock);

auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;

if (__stop_time > __start_time)

_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])

_max_time[thread_no] = __latency;


#endif

if (++i >= TEST_PTHREAD_RUN_CNT)
break;


return 0;



/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)


struct sched_param param;
pthread_attr_t attr;

printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);

/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);

/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, &param);

/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);

pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);

/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);

param_list[thread_no].thread_no = thread_no;

if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)&param_list[thread_no]) != 0)

printf("Thread could not be created.n");
exit(-1);




/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");

/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);

/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");







c++ multithreading time linux-kernel pthreads






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 26 at 14:10









Gil Hamilton

9,36517 silver badges41 bronze badges




9,36517 silver badges41 bronze badges










asked Mar 25 at 10:58









tozaktozak

2263 silver badges11 bronze badges




2263 silver badges11 bronze badges












  • May be OT and stupid question: How do you ensure that ++cnt; takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?

    – Scheff
    Mar 25 at 11:10











  • Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…

    – Erik Alapää
    Mar 26 at 12:28

















  • May be OT and stupid question: How do you ensure that ++cnt; takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?

    – Scheff
    Mar 25 at 11:10











  • Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…

    – Erik Alapää
    Mar 26 at 12:28
















May be OT and stupid question: How do you ensure that ++cnt; takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?

– Scheff
Mar 25 at 11:10





May be OT and stupid question: How do you ensure that ++cnt; takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?

– Scheff
Mar 25 at 11:10













Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…

– Erik Alapää
Mar 26 at 12:28





Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…

– Erik Alapää
Mar 26 at 12:28












1 Answer
1






active

oldest

votes


















1














When you run with SCHED_FIFO, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.



When you run with SCHED_RR, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.



I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT), you'll see that the SCHED_RR behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.



If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.



But then note: the result of that might well turn the SCHED_RR behavior into the SCHED_FIFO behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.



Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.






share|improve this answer























  • Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

    – tozak
    Mar 26 at 18:38












  • One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

    – Erik Alapää
    Mar 28 at 8:36














Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336272%2fwhat-may-be-the-cause-of-the-elapsed-time-spikes-in-a-pthread-critical-section-w%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














When you run with SCHED_FIFO, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.



When you run with SCHED_RR, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.



I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT), you'll see that the SCHED_RR behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.



If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.



But then note: the result of that might well turn the SCHED_RR behavior into the SCHED_FIFO behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.



Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.






share|improve this answer























  • Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

    – tozak
    Mar 26 at 18:38












  • One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

    – Erik Alapää
    Mar 28 at 8:36
















1














When you run with SCHED_FIFO, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.



When you run with SCHED_RR, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.



I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT), you'll see that the SCHED_RR behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.



If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.



But then note: the result of that might well turn the SCHED_RR behavior into the SCHED_FIFO behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.



Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.






share|improve this answer























  • Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

    – tozak
    Mar 26 at 18:38












  • One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

    – Erik Alapää
    Mar 28 at 8:36














1












1








1







When you run with SCHED_FIFO, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.



When you run with SCHED_RR, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.



I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT), you'll see that the SCHED_RR behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.



If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.



But then note: the result of that might well turn the SCHED_RR behavior into the SCHED_FIFO behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.



Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.






share|improve this answer













When you run with SCHED_FIFO, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.



When you run with SCHED_RR, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.



I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT), you'll see that the SCHED_RR behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.



If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.



But then note: the result of that might well turn the SCHED_RR behavior into the SCHED_FIFO behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.



Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 26 at 15:46









Gil HamiltonGil Hamilton

9,36517 silver badges41 bronze badges




9,36517 silver badges41 bronze badges












  • Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

    – tozak
    Mar 26 at 18:38












  • One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

    – Erik Alapää
    Mar 28 at 8:36


















  • Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

    – tozak
    Mar 26 at 18:38












  • One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

    – Erik Alapää
    Mar 28 at 8:36

















Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

– tozak
Mar 26 at 18:38






Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.

– tozak
Mar 26 at 18:38














One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

– Erik Alapää
Mar 28 at 8:36






One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.

– Erik Alapää
Mar 28 at 8:36




















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336272%2fwhat-may-be-the-cause-of-the-elapsed-time-spikes-in-a-pthread-critical-section-w%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript