What may be the cause of the elapsed-time spikes in a pthread critical section when the scheduling policy is SCHED_RR?Whole one core dedicated to single processIn Java critical sections, what should I synchronize on?What is the difference between mutex and critical section?Reading Critical Section Data using pthreadspthreads : pthread_cond_signal() from within critical sectionC++ Exception Handling In Critical Section (pthreads)What are Critical sections in threadsWhy setting of the scheduled policy and parameter is failing in my program - pthread Linux?pthread and accessing the critical sectionPthread Scheduling policy and prioritySCHED_RR policy to control scheduling (POSIX or Linux)
How should I behave to assure my friends that I am not after their money?
Are Finite Automata Turing Complete?
How come I was asked by a CBP officer why I was in the US?
Cascading Repair Costs following Blown Head Gasket on a 2004 Subaru Outback
Does the Distant Spell metamagic apply to the Sword Burst cantrip?
How could mana leakage be dangerous to a elf?
Calculating the partial sum of a expl3 sequence
A player is constantly pestering me about rules, what do I do as a DM?
What do you call the action of someone tackling a stronger person?
Is there a short way to compare many values mutually at same time without using multiple 'AND'-s?
Impossible darts scores
Transition from ascending to descending a rope
Layout of complex table
Declining an offer to present a poster instead of a paper
Do equal angles necessarily mean a polygon is regular?
Analog is Obtuse!
How can I repair scratches on a painted French door?
Firefox Arm64 available but RapsPi 3B+ still 32 bit
Counting occurrence of words in table is slow
Is adding a new player (or players) a DM decision, or a group decision?
"Strength" of free falling objects' ground-impact
Can a US President have someone sent to prison?
How dangerous are set-size assumptions?
Should I hide continue button until tasks are completed?
What may be the cause of the elapsed-time spikes in a pthread critical section when the scheduling policy is SCHED_RR?
Whole one core dedicated to single processIn Java critical sections, what should I synchronize on?What is the difference between mutex and critical section?Reading Critical Section Data using pthreadspthreads : pthread_cond_signal() from within critical sectionC++ Exception Handling In Critical Section (pthreads)What are Critical sections in threadsWhy setting of the scheduled policy and parameter is failing in my program - pthread Linux?pthread and accessing the critical sectionPthread Scheduling policy and prioritySCHED_RR policy to control scheduling (POSIX or Linux)
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)
I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).
I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.
Code example with __rdtsc:
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
Code example with chrono timer:
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.
Now, here the things go interesting (at least for me):
Test 1: Threads have the scheduling policy as SCHED_RR:
Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000
Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000
Test 2: Threads have the scheduling policy as SCHED_FIFO:
Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000
Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000
Test 3: Only single thread, the scheduling policy is SCHED_RR:
Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000
Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.
Note1: All tests give approx. similar results everytime I run them.
Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.
So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:
- How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...
- Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?
My all test code is:
#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>
/********* TEST CONFIG ************/
#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR
#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/
/**********************************/
struct param_list_s
unsigned int thread_no;
;
/**********************************/
/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/
/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);
/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)
cpu_set_t mask;
int i;
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;
_max_time[i] = 0;
_tot_time[i] = 0;
cnt = 0;
printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");
configureMemoryBehavior();
pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
create_rt_pthread(i);
printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)
pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);
return 0;
/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)
unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;
thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)
#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
if (stop_time > start_time)
latency = stop_time - start_time;
++run_cnt[thread_no];
tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;
#endif
#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;
if (__stop_time > __start_time)
_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])
_max_time[thread_no] = __latency;
#endif
if (++i >= TEST_PTHREAD_RUN_CNT)
break;
return 0;
/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)
struct sched_param param;
pthread_attr_t attr;
printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);
/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, ¶m);
/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);
/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
param_list[thread_no].thread_no = thread_no;
if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)¶m_list[thread_no]) != 0)
printf("Thread could not be created.n");
exit(-1);
/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");
/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);
/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");
c++ multithreading time linux-kernel pthreads
add a comment |
I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)
I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).
I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.
Code example with __rdtsc:
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
Code example with chrono timer:
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.
Now, here the things go interesting (at least for me):
Test 1: Threads have the scheduling policy as SCHED_RR:
Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000
Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000
Test 2: Threads have the scheduling policy as SCHED_FIFO:
Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000
Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000
Test 3: Only single thread, the scheduling policy is SCHED_RR:
Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000
Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.
Note1: All tests give approx. similar results everytime I run them.
Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.
So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:
- How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...
- Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?
My all test code is:
#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>
/********* TEST CONFIG ************/
#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR
#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/
/**********************************/
struct param_list_s
unsigned int thread_no;
;
/**********************************/
/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/
/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);
/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)
cpu_set_t mask;
int i;
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;
_max_time[i] = 0;
_tot_time[i] = 0;
cnt = 0;
printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");
configureMemoryBehavior();
pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
create_rt_pthread(i);
printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)
pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);
return 0;
/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)
unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;
thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)
#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
if (stop_time > start_time)
latency = stop_time - start_time;
++run_cnt[thread_no];
tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;
#endif
#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;
if (__stop_time > __start_time)
_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])
_max_time[thread_no] = __latency;
#endif
if (++i >= TEST_PTHREAD_RUN_CNT)
break;
return 0;
/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)
struct sched_param param;
pthread_attr_t attr;
printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);
/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, ¶m);
/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);
/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
param_list[thread_no].thread_no = thread_no;
if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)¶m_list[thread_no]) != 0)
printf("Thread could not be created.n");
exit(-1);
/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");
/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);
/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");
c++ multithreading time linux-kernel pthreads
May be OT and stupid question: How do you ensure that++cnt;
takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?
– Scheff
Mar 25 at 11:10
Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…
– Erik Alapää
Mar 26 at 12:28
add a comment |
I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)
I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).
I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.
Code example with __rdtsc:
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
Code example with chrono timer:
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.
Now, here the things go interesting (at least for me):
Test 1: Threads have the scheduling policy as SCHED_RR:
Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000
Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000
Test 2: Threads have the scheduling policy as SCHED_FIFO:
Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000
Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000
Test 3: Only single thread, the scheduling policy is SCHED_RR:
Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000
Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.
Note1: All tests give approx. similar results everytime I run them.
Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.
So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:
- How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...
- Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?
My all test code is:
#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>
/********* TEST CONFIG ************/
#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR
#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/
/**********************************/
struct param_list_s
unsigned int thread_no;
;
/**********************************/
/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/
/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);
/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)
cpu_set_t mask;
int i;
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;
_max_time[i] = 0;
_tot_time[i] = 0;
cnt = 0;
printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");
configureMemoryBehavior();
pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
create_rt_pthread(i);
printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)
pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);
return 0;
/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)
unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;
thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)
#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
if (stop_time > start_time)
latency = stop_time - start_time;
++run_cnt[thread_no];
tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;
#endif
#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;
if (__stop_time > __start_time)
_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])
_max_time[thread_no] = __latency;
#endif
if (++i >= TEST_PTHREAD_RUN_CNT)
break;
return 0;
/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)
struct sched_param param;
pthread_attr_t attr;
printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);
/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, ¶m);
/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);
/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
param_list[thread_no].thread_no = thread_no;
if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)¶m_list[thread_no]) != 0)
printf("Thread could not be created.n");
exit(-1);
/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");
/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);
/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");
c++ multithreading time linux-kernel pthreads
I am doing some time calculation tests in Linux. My kernel is Preempt-RT (however vanilla kernel gives similar results in my tests...)
I have two pthreads, running concurently in the same processor (affinitty given). They are real-time threads (prio 99).
I have a critical section protected by a spin lock, where two threads compete for the lock. Inside the critical section, I have a single increment operation and I try to calculate the elapsed time for this operation.
Code example with __rdtsc:
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt; //shared ram variable, type is unsigned long long
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
Code example with chrono timer:
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt; //shared ram variable, type is unsigned long long
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
Threads run in loop for a couple of million times and then terminate. After unlocking the spin lock, I log the the average elapsed time and the maximum elapsed time.
Now, here the things go interesting (at least for me):
Test 1: Threads have the scheduling policy as SCHED_RR:
Thread no: 0, Max Time: 34124, Avg Time: 28.114271, Run Cnt: 10000000
Thread no: 1, Max Time: 339256976, Avg Time: 74.781960, Run Cnt: 10000000
Test 2: Threads have the scheduling policy as SCHED_FIFO:
Thread no: 0, Max Time: 33114, Avg Time: 48.414173, Run Cnt: 10000000
Thread no: 1, Max Time: 38637, Avg Time: 24.327742, Run Cnt: 10000000
Test 3: Only single thread, the scheduling policy is SCHED_RR:
Thread no: 0, Max Time: 34584, Avg Time: 54.165470, Run Cnt: 10000000
Note: Main thread is a non-rt thread which has an affinity in a seperate processor. It has no important here.
Note1: All tests give approx. similar results everytime I run them.
Note2: The results given are output of the rdtsc. However, the chrono timer results are nearly similar to these.
So I think I may have a mis-understanding of the scheduler, so I need to ask these questions:
- How does the huge maximum time spikes occur in the test 1? Test 2&3 don't behave like it...
- Why is there a very large gap between the maximum and the average calculations? What causes this, an interrupt like timer?
My all test code is:
#include <stdio.h>
#include <stdlib.h>
#include "stdint.h"
#include <float.h>
#include <pthread.h>
#include <cxxabi.h>
#include <limits.h>
#include <sched.h>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <malloc.h>
#include <chrono>
/********* TEST CONFIG ************/
#define TEST_PTHREAD_RUN_CNT 10000000 //1000000000
#define NUM_OF_TEST_PTHREADS 2
#define MAIN_THREAD_CORE_INDEX 0
#define TEST_PTHREAD_PRIO 99
#define TEST_PTHREAD_POLICY SCHED_RR
#define TIME_RDTSC 1
#define TIME_CHRONO 0
/**********************************/
/**********************************/
struct param_list_s
unsigned int thread_no;
;
/**********************************/
/********* PROCESS RAM ************/
pthread_t threads[NUM_OF_TEST_PTHREADS];
struct param_list_s param_list[NUM_OF_TEST_PTHREADS];
unsigned long long max_time[NUM_OF_TEST_PTHREADS];
unsigned long long _max_time[NUM_OF_TEST_PTHREADS];
unsigned long long tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long _tot_time[NUM_OF_TEST_PTHREADS];
unsigned long long run_cnt[NUM_OF_TEST_PTHREADS];
unsigned long long cnt;
pthread_spinlock_t lock;
/**********************************/
/*Proto*/
static void configureMemoryBehavior(void);
void create_rt_pthread(unsigned int thread_no);
/*
* Date............:
* Function........: main
* Description.....:
*/
int main(void)
cpu_set_t mask;
int i;
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
max_time[i] = 0;
tot_time[i] = 0;
run_cnt[i] = 0;
_max_time[i] = 0;
_tot_time[i] = 0;
cnt = 0;
printf("nSetting scheduler affinity for the process...");
CPU_ZERO(&mask);
CPU_SET(MAIN_THREAD_CORE_INDEX, &mask);
sched_setaffinity(0, sizeof(mask), &mask);
printf("done.n");
configureMemoryBehavior();
pthread_spin_init(&lock, PTHREAD_PROCESS_PRIVATE);
for (i = 0; i < NUM_OF_TEST_PTHREADS; ++i)
create_rt_pthread(i);
printf("Waiting threads to joinn...n");
for (i = 0; i < NUM_OF_TEST_PTHREADS; i++)
pthread_join(threads[i], NULL);
#if(TIME_RDTSC == 1)
printf("Thread no: %d, Max Time: %llu, Avg Time: %f, Run Cnt: %llun", i, max_time[i], (float)((float)tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
#if(TIME_CHRONO == 1)
printf("Thread no: %d, Max Time: %lu, Avg Time: %f, Run Cnt: %lun", i, _max_time[i], (float)((float)_tot_time[i] / run_cnt[i]), run_cnt[i]);
#endif
printf("All threads joinedn");
printf("Shared Cnt: %llun", cnt);
return 0;
/*
* Date............:
* Function........: thread_func
* Description.....:
*/
void *thread_func(void *argv)
unsigned long long i, start_time, stop_time, latency = 0;
unsigned int thread_no;
thread_no = ((struct param_list_s *)argv)->thread_no;
i = 0;
while (1)
#if(TIME_RDTSC == 1)
pthread_spin_lock(&lock);
start_time = __rdtsc();
++cnt;
stop_time = __rdtsc();
pthread_spin_unlock(&lock);
if (stop_time > start_time)
latency = stop_time - start_time;
++run_cnt[thread_no];
tot_time[thread_no] += latency;
if (latency > max_time[thread_no])
max_time[thread_no] = latency;
#endif
#if(TIME_CHRONO == 1)
pthread_spin_lock(&lock);
auto _start_time = std::chrono::high_resolution_clock::now();
++cnt;
auto _stop_time = std::chrono::high_resolution_clock::now();
pthread_spin_unlock(&lock);
auto __start_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_start_time.time_since_epoch()).count();
auto __stop_time = std::chrono::duration_cast<std::chrono::nanoseconds>(_stop_time.time_since_epoch()).count();
auto __latency = __stop_time - __start_time;
if (__stop_time > __start_time)
_tot_time[thread_no] += __latency;
++run_cnt[thread_no];
if (__latency > _max_time[thread_no])
_max_time[thread_no] = __latency;
#endif
if (++i >= TEST_PTHREAD_RUN_CNT)
break;
return 0;
/*
* Date............:
* Function........: create_rt_pthread
* Description.....:
*/
void create_rt_pthread(unsigned int thread_no)
struct sched_param param;
pthread_attr_t attr;
printf("Creating a new real-time threadn");
/* Initialize pthread attributes (default values) */
pthread_attr_init(&attr);
/* Set a specific stack size */
pthread_attr_setstacksize(&attr, PTHREAD_STACK_MIN);
/* Set scheduler policy and priority of pthread */
pthread_attr_setschedpolicy(&attr, TEST_PTHREAD_POLICY);
param.sched_priority = TEST_PTHREAD_PRIO;
pthread_attr_setschedparam(&attr, ¶m);
/* Set the processor affinity*/
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(1, &cpuset);
pthread_attr_setaffinity_np(&attr, sizeof(cpu_set_t), &cpuset);
/* Use scheduling parameters of attr */
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
param_list[thread_no].thread_no = thread_no;
if(pthread_create(&threads[thread_no], &attr, thread_func, (void *)¶m_list[thread_no]) != 0)
printf("Thread could not be created.n");
exit(-1);
/*
* Date............:
* Function........: configureMemoryBehavior
* Description.....:
*/
static void configureMemoryBehavior(void)
MCL_FUTURE))
perror("mlockall failed:");
/* Turn off malloc trimming.*/
mallopt(M_TRIM_THRESHOLD, -1);
/* Turn off mmap usage. */
mallopt(M_MMAP_MAX, 0);
printf("done.n");
c++ multithreading time linux-kernel pthreads
c++ multithreading time linux-kernel pthreads
edited Mar 26 at 14:10
Gil Hamilton
9,36517 silver badges41 bronze badges
9,36517 silver badges41 bronze badges
asked Mar 25 at 10:58
tozaktozak
2263 silver badges11 bronze badges
2263 silver badges11 bronze badges
May be OT and stupid question: How do you ensure that++cnt;
takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?
– Scheff
Mar 25 at 11:10
Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…
– Erik Alapää
Mar 26 at 12:28
add a comment |
May be OT and stupid question: How do you ensure that++cnt;
takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?
– Scheff
Mar 25 at 11:10
Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…
– Erik Alapää
Mar 26 at 12:28
May be OT and stupid question: How do you ensure that
++cnt;
takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?– Scheff
Mar 25 at 11:10
May be OT and stupid question: How do you ensure that
++cnt;
takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?– Scheff
Mar 25 at 11:10
Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…
– Erik Alapää
Mar 26 at 12:28
Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…
– Erik Alapää
Mar 26 at 12:28
add a comment |
1 Answer
1
active
oldest
votes
When you run with SCHED_FIFO
, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO
works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.
When you run with SCHED_RR
, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR
works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.
I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT
), you'll see that the SCHED_RR
behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.
If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t
. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.
But then note: the result of that might well turn the SCHED_RR
behavior into the SCHED_FIFO
behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.
Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336272%2fwhat-may-be-the-cause-of-the-elapsed-time-spikes-in-a-pthread-critical-section-w%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
When you run with SCHED_FIFO
, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO
works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.
When you run with SCHED_RR
, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR
works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.
I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT
), you'll see that the SCHED_RR
behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.
If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t
. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.
But then note: the result of that might well turn the SCHED_RR
behavior into the SCHED_FIFO
behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.
Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
add a comment |
When you run with SCHED_FIFO
, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO
works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.
When you run with SCHED_RR
, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR
works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.
I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT
), you'll see that the SCHED_RR
behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.
If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t
. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.
But then note: the result of that might well turn the SCHED_RR
behavior into the SCHED_FIFO
behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.
Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
add a comment |
When you run with SCHED_FIFO
, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO
works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.
When you run with SCHED_RR
, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR
works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.
I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT
), you'll see that the SCHED_RR
behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.
If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t
. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.
But then note: the result of that might well turn the SCHED_RR
behavior into the SCHED_FIFO
behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.
Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.
When you run with SCHED_FIFO
, one of your threads starts running. It then runs until it's finished -- because that's how SCHED_FIFO
works -- nothing will preempt it. The time it spends within the spinlock therefore is relatively consistent. Then, after the first thread is finished, the second thread runs to completion without contention for the lock. So it too has a more consistent time. There is still some jitter in both due to interrupts and so forth but that is fairly consistent between the two.
When you run with SCHED_RR
, one of your threads runs for a while. At the end of a time slice, it gets pre-empted and the other one will get to run -- because that's how SCHED_RR
works. Now, there's a good chance it gets pre-empted while holding the spinlock. So, now the other thread is running, it immediately attempts to grab the spinlock, which fails -- because the other thread holds the lock. But it just keeps trying until the end of the time slice (because that's how spinlocks work -- it won't ever block waiting to acquire the lock). Of course it accomplishes nothing during this time. Eventually, the time-slice ends, and the thread holding the lock gets to run again. But the time attributed to that single increment operation now includes all that time waiting for the other thread to spin throughout its time-slice.
I think if you increase the maximum count (TEST_PTHREAD_RUN_CNT
), you'll see that the SCHED_RR
behavior evens out as both of your threads eventually get subjected to this effect. Right now, I'm guessing there's a good chance that one thread can pretty much finish within one or two time slices.
If you want to lock out another thread running with an equivalent priority on the same processor, you should probably be using a pthread_mutex_t
. That will act pretty much the same as a spinlock in the successful acquisition case, but will block when it can't acquire the lock.
But then note: the result of that might well turn the SCHED_RR
behavior into the SCHED_FIFO
behavior: most of the time, the pre-emption will happen while one thread has the lock held, so the other one will get to run for a few instructions until it attempts to acquire the lock, then it will block and the first will get to run again for a full time-slice.
Overall, it's really dicey to attempt to run two RT priority threads on one processor where both of them are expected to run for long periods of time. RT priority will work best where you lock each thread to its own core, or where the RT threads need to get scheduled immediately, but will only run for a short time before blocking again.
answered Mar 26 at 15:46
Gil HamiltonGil Hamilton
9,36517 silver badges41 bronze badges
9,36517 silver badges41 bronze badges
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
add a comment |
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
Crystal clear, thanks! I had a misconception like if the thread holds the spin lock, then it won't be preempted. But of course it can get preempted, I get it now. Also you are right, SCHED_RR test give absymal results for the both threads when the run count is higher. I also give it a go with pthread_mutex_t, and the elapsed times are OK with it.
– tozak
Mar 26 at 18:38
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
One weird observation is, I tried a modified version of the above program, with spinlocks replaced by pthread mutexes. I tried this before the accepted answer was posted. This improved the situation with SCHED_RR significantly, but I still had at least one instance where there was a very long time logged, which surprised me. Maybe other processes or perhaps kernel threads running on the same core could interfere. 6 core Xeon E5-1650, kernel 4.19.
– Erik Alapää
Mar 28 at 8:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55336272%2fwhat-may-be-the-cause-of-the-elapsed-time-spikes-in-a-pthread-critical-section-w%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
May be OT and stupid question: How do you ensure that
++cnt;
takes effect inside of time measurement? I mean: are you sure that certain re-ordering of statements doesn't make your measurement lying?– Scheff
Mar 25 at 11:10
Have you tried running the two realtime threads on a dedicated CPU core, without any other processes on that core? See stackoverflow.com/questions/13583146/…
– Erik Alapää
Mar 26 at 12:28