Wednesday, 31 October 2012

Important System calls

In this post I am going to explain about the important system calls used in Linux. Let us start straight away.

fork():

The famous system call which is used to spawn new processes. When a process calls this function, a child is created which is an exact copy of the parent. The underlying things that happen are explained here.

vfork():

This is just like a fork function call, but initially all the pages are shared between the parent and child and it doesn't follow the copy-on-write rule. This is used in the case when the child process calls exec() directly. This function improves the performance of spawning a new process.

exec():

This overwrites the address space of the calling process with the executable program specified as an argument for this function. There are 6 variations for this function. more from here.

mmap():

This is used to map a file onto the RAM and perform reads and writes from this memory instead of disk. more here

fcntl:

This command is used to perform various operations like duplicating the file descriptors, setting and getting the descriptor flags and much more. see here

fstat:

This is used to retrieve all the details of a file when its name or an open file descriptor is given. Similar versions of this are lstat and stat. more here.

ioctl:

This is used to perform the I/O operations on the devices by directly issuing commands to the device driver. The device controller is by-passed in this case. more here

link:

This system call creates a link to a specified file ( especially hard link). The opposite of this is done by unlink. see more

access:

This function is used to check if there are necessary permissions to perform some operations on a file by a process. view more

chmod:

This is used to change the permission bits associated with a file.(ofcourse the calling process should have appropriate permissions to do that) see more.

chown:

It is used to change the owner of a file provided the process has the permissions to do that.see more

umask:

If you want to change the default permission bits created when we create a new file or directory, then you can use this function call. see here

chroot:

This is used to change the root directory of a process. I mean to say that the path which we give as argument to this function, makes the process to think that the "/" starts from the specified point. more here.

wait:

This function call blocks until one of the child processes exits. Another variation is waitpid in which we can specify the pid of a particular process. Another variation is wait3 which is used to get more details of the terminated process like the resource info etc.check out the man pages.

get**id:

getuid returns the user id, geteuid returns the effective user id, getgid gives group ID, getegid gives the effective group ID, getpid gives process ID, getppid gives parent process ID, getpgrp returns the process group ID.

pause:

This function call gets blocked until a signal is received by a process and after that it returns. Another variation of this is sigpause. see more here

alarm:

This is used to specify some seconds as an argument after which a SIGALRM is sent to the process calling this function.

pipe:

This is used to create a pipe between two related processes and takes an argument as an array of two ints. see more.

popen:

Some variation of pipe is popen. This does all the work to create a pipe and returns a file pointer just like for an open file.

flock:

This is used to lock a file by the process. It can be locked in various modes by using a shared lock or exclusive lock. see more. Another variation of this is lockf in which we use the POSIX locks to lock a file.

mprotect:

This is used to protect(or lock) the memory of a process on the RAM. We can specify different flags to set the desired mode. see more.

setfacl:

This is used to set the file access control lists which are more better than the normal permission bits and increase the security of the file. Inorder to get the file ac. ctrl lists use getfacl. see more.

fsck:

It is used to check and repair a file system. The exit code that it returns, can be used to know the various errors in the file system. see more.

ipcrm:

It removes the msg queues, semaphores, shared memory which have the specified ID as argument. see more. Another function ipcs provides information about the various IPC facilities.

message queues:

The msgget returns an ID which corresponds to a new message queue. msgsnd sends a message into the queue. msgrcv blocks until there is atleast one message in the queue and this is returned. msgctl is used to control various activities of the msg queue. check out the man pages.

shared memory:

shmget returns an ID which corresponds to a shared memory. shmat attaches the memory on to the process address space, while shmdt does the opposite thing. Finally, shmctl is used to control various parameters of the shared memory.


The system calls related to semaphores can be seen here and many other signal related system calls from here.


Tuesday, 30 October 2012

Virtual Memory-II

You need to read the first part of this post in order to understand this one.

I just said that there is a severe problem called Thrashing. It is as follows: Suppose, a lower priority process is executing and a higher priority process started taking the frames from it using the global allocation mechanism. This reduces the number of frames of the lower priority process and the page-fault rate increases.

When this drops below the minimum required frames, the process keeps on page-faulting for the same pages that are actively used and spends a lot of time on this rather than its execution. This situation is called as Thrashing.

This reduces the CPU utilization and increases the burden on the Paging device. To increase CPU utilization, the OS starts a new process and this again takes frames from other running processes making the things even worse.

Clearly, we have to prevent this situation. One way is that a thrashing process should not be allowed to take frames from other process. But still this process spends lot of time in the ready queue and the average service time for a page-fault increases since the queue is long.

Other way is to provide each process with minimum required frames and it should not decrease this mark. This way we can prevent thrashing but the hardest part is how to know the minimum number of frames required?

Working set Model:

The basic idea is that every process executes some sequence of routines during its execution. This means that a process uses some pages actively when executing a function which is in that pages. This is called locality of reference.

What we do is, we keep a window of some size and examine the set of pages that are being used, we count this number of pages and allocate that many frames to that process, if in future the size of this set increases we allocate more frames to this process. These frames are obtained by suspending a lower priority process which is later resumed.

In this way, we keep the degree of multi-programming and CPU throughput higher. But the hardest part is keeping track of the window. Since it moves for every reference, it is a bit expensive to calculate the number of frames.

The other way is to check the page fault frequency. In this method, we keep a minimum and maximum value of page-fault frequency. When it falls below the minimum, it means there are more frames allocated than required. So, we take away some frames and start a new process. In the other case, we allocate new frames to the process.

In this way, unlike the previous method, there is no need to suspend a process and the overhead involved is not much.


Memory Mapped Files:

We can extend the concept of demand paging to the files also. Normally, when we open a file and perform read and write operations, they are written to the disk either synchronously or asynchronously. This involves a lot of overhead(disk accesses) and increases the execution time.

A clever approach is to bring the files data just like pages into the main memory and perform reads and writes to it, and at last when the process closes the file, it is copied on to the disk. This reduces the number of disk accesses to a large extent.

If we are interested to share the same file among different processes, we simply map the memory location of the file into the address space of the processes. It is just like Shared Memory in Unix systems.

In Linux, this operation is performed by using mmap() system call.


Memory Mapped I/O:

This technique is for the convenience of the OS. Instead of writing or reading directly from the I/O device, it just writes/reads into the buffers(technically called as registers) which are located somewhere in the main memory and sets some bit to indicate the operation.These locations are called port of that I/O device. These operations may be synchronous or asynchronous and it is up to the OS.


Allocating Kernel Memory:

Kernel memory allocation must be different from normal user-mode process memory because, first, the kernel requires data structures of different size resulting in internal fragmentation due to fixed size pages and secondly, the kernel memory must be contiguous since the hardware might not use the virtual memory service.

Buddy System:

In this method, the buddy system allocates memory in the powers of 2. Suppose, there is a contiguous allocation of 512kB, and we want to allocate 63kB, then we divide the contiguous memory by half until we can satisfy the conditions. Here we stop dividing until 64kB and allocate it.

At last, when the kernel releases memory we coalesce different parts and reclaim the memory. This suffers from internal fragmentation. If kernel needs 65kB, we still give 128kB which wastes the memory leaving the remaining 63kB unused.

Slab Allocation:

In this method, we use slabs which are usually a contiguous set of pages. There will be caches of different sizes to represent different data structures. These various sized memory locations are allocated in advance. Whenever the kernel needs an object, it just marks it as used(Of course, initially all are marked as free) and uses it. When it wants to release the memory, it just marks it as free again. No overhead of allocating/de-allocating!

In the figure, we can see that kernel is using 2 3kB and 3 7kB objects.

These slabs may be either empty or partially filled or full. The allocating algorithm gives priority to fill the partially filled one first and then sees to fill the empty one.

As we can observe, this has advantages like no memory wastage and reduced time to allocate since the objects are allocated in advanced as said previously. Because of the various advantages, linux also uses this method.


In order to still have a better performance, we need to consider other issues which are as follows:

Prepaging:

Initially, when the process enters into the memory, there will be some continuous page faults, until it uses every page it has been allocated. The reason is that the process tries to get all the pages which have the currently executing function.(in other words, this is called Localityof Reference)

In order to avoid the above scenario, we use Prepaging. In this method, all the required pages by which the process can get its locality of reference are loaded into the main memory before the process starts( or restarts if it was previously suspended). Even when we swap-out, we need to copy these entire pages into the swap space.

This method is useful only if the pre-paged pages are used during the execution or else we can ignore this method.


Page Size:

So, let us discuss about the page size needed. What do you think? Should it be small or large? Let us see the factors effecting it.

Larger page size increases the Internal Fragmentation.

Page table size increases with decrease in page size. Hence we need to have larger page size.

Larger pages take more time for I/O operations. While smaller pages increase the total I/O that must be performed.

Larger page size reduces the number of page faults and also may include more locality of Refernces, which are not needed.

Now don't get messed up with these factors. Still it is a big question of what page size to use. But the modern operating systems are favouring larger pages itself.


I/O Interlock:

Sometimes, a process might issue an I/O request and starts to wait in the queue for that device. Meanwhile, another process might page-fault and takes the page of the former process(of course, by global replacement) to which the I/O has to occur. This makes the process messy when the I/O device tries to write to this page.

To prevent this, we associate a lock bit for each frame. Whenever, this bit is set, that page is never selected for replacement and hence the execution happens smoothly.

There are still other advantages in normal page replacement. Consider a low-priority process page-faulted and got the desired page into the memory. When it starts to wait for the CPU, a higher priority process may page-fault and it may take this newly fetched page of the former process (this page has higher probability to be replaced because both reference and modify bits are unset).

We are benifiting a high priority process at the expense of low-priority process. But the work done to fetch page previously is being wasted. To prevent this, we can use the lock bit.

This may be disadvantageous if the process doesn't clear the lock bit, because it makes the frame unusable. So, this should be carefully implemented.


Although I skipped few things, you can still refer them from here.

Friday, 26 October 2012

Virtual Memory-I

In this post I am going to explain about the concept of virtual memory used in the modern Computer Systems.


Why virtual memory?:

Now, the first important question to pose is why to actually use this concept. So, when you have a large program of size say 2 GB, and you want to run it. So, what a computer without virtual memory management usually does is it starts loading the whole program into the RAM first and then when it is ready with the loading it starts executing it. This is not a big problem if the computer runs single program at-a-time like the first version of MS-DOS systems.

But now-a-days, nobody uses such a computer. Everybody wants to have a computer that has Multi-programming capability. So, when such a big program as said previously wants to be executed, it is tough for the computer to execute multiple programs, since the available RAM memory is mostly occupied by the big program.

So, how to overcome this problem? This post explains what to do for that and also the additional benefits of Virtual Memory Technique.


Diving into the topic:

So, you may doubt that why we keep on calling it "virtual memory"? Does this refer to a non-real memory? The answer is yes! It is because the user who writes the program thinks of having a large amount of memory, for instance you can think that the amount of memory that can be used by your program can be as large as 10 GB, even though your RAM is far smaller.

In simple words we are bluffed by the computer in the way we think of the memory. Suppose, you have a RAM of 2 GB. Now, you write a program that creates an array of size 3 GB. Even though you try to access the last byte, this actually gets mapped somewhere in the RAM itself!! Now, don't think some weird thing is happening inside. It is because of the virtual memory!

There may be some programs which have functions to handle exceptions, intuitively we can say that these are rarely executed. Then what is the need of bringing these routines into the main memory? We additionally are wasting the memory.

In order to prevent this, what the computer does is, it simply brings only those parts of the program which will be executed first and further it brings the parts which contain routines only when they are called/referenced.

Here, I was saying the "part" of the program. But formally they are called "pages". There is nothing new with this word. It is the technical way of saying "the whole program is divided into some fixed size chunks which usually have a size in the powers of 2." So, from now on, I refer to these 'parts' as 'pages'.

Now, when we try to execute the program by keeping it partially in the main memory, there are several advantages like:

1. The main memory is saved a lot and this can be used to execute some other programs.

2. Intuitively, there is no need to worry about the available physical memory.

3. The time required to swap out a program is very less, because we have only the small portion of code in the memory. If you are not familiar with "swapping", it is nothing but the temporary suspension of a program in which it is copied from the RAM to some location on the hard disk. This is usually done when the program waits for I/O or some other operation to happen. The available RAM space can be used to execute some other program.


Implementing Virtual memory:

One way of implementing the virtual memory is by Demand Paging. It is very simple and is as follows:

"Bring the pages of a program into the RAM, only when they are referenced"

To make it clear, let me take an example. Suppose that I have written a C program which has 4 games in it. At the beginning, I prompt the user to choose one of them. In this case, it is useless to bring the whole program into the RAM. Instead, we can do this way. First bring the page containing the main function, then allow the user to choose the game, then bring the page with the chosen game into the memory. This method is called demand paging. Here we can see the amount of RAM saved!

Now, let us see how this method is physically implemented. For every process, there will be a page table associated with it. A page table is just like a look-up table in which if we have the page-number for a page we can find its corresponding location on the RAM. This is technically called Frame. A Frame is just like a Page except that the former one refers to RAM while the later one refers to a program. So, we say that the pages of different programs are allocated to different Frames on the RAM.

Since we are not bringing all the pages into the RAM, the entries actually having a page should be distinguished from those entries which doesn't. For this purpose, there is one more bit called valid/invalid bit associated with each page, in which if the bit is set to valid, then there is a page in the RAM at the specified address, or else there isn't.

Page-Fault:

Suppose, that a page is referencing another page which is not currently on RAM or in other words whose valid bit is not set, then this causes operating system to trap. This is called Page-Fault. It temporarily suspends the process, brings in the desired page (provided it is a valid memory reference or else the program terminates saying Segmentation error) into one of the frames, sets the valid bit and fills the entry with corresponding address and resumes the execution of process.

Fork():

Let us see the action of fork() function and its relation with demand paging. When we call fork(), it creates a new process which has the exact address space as that of parent process. To accomplish this internally, one might think to duplicate all the pages of the parent and give them to child. This method has a lot of overhead and makes the process of creating a new process very slow.

To overcome this problem, we use a technique called copy-on-write in which new process is created on the fly. In this method, we bypass the demand paging. It is as follows. Initially, all the pages of the parent and child are shared among them. When either of the process tries to modify one of the pages, the modified page is duplicated between them. This reduces the overhead to create a process.

vfork() is similar to fork(), but initially it doesn't share any pages between the parent and child, i.e., both refer to same pages. Whenever , the child calls exec(), it is given a new set of pages with the specified program in it. This method still increases the speed of execution. Typically, this system call is used when the child process calls exec() immediately.


Page Replacement:

We are allocating a new Frame for the process each time a page fault occurs. If we keep on doing that, then consequently the memory occupied by the process increases, and thus it will be hard for bringing a new process and execute since the RAM is being used up. So, the idea is to allocate fixed number of frames for each process and if the page fault occurs after all the frames are used up, then we try to replace one of the pages and allocate it with the new one.

This can be accomplished using different techniques which are discussed below:

Basic Page Replacement:

This is quite simple. First, when page-fault occurs if there is a free frame use it. Else choose one of the frames to replace(normally called victim frame) by some procedure and copy its contents to swap space and write the desired page into that free frame.

This method takes two disk accesses, one to write the victim frame and other to read the referenced frame. This can be further reduced by having a modify/dirty bit. This is set, if the victim frame is modified else not. If it is not set, we simply overwrite the victim page else we need to perform the same as said previously.

Let us see the different procedures of selecting the victim frame.

FIFO Page Replacement:

In this method, we try to replace the oldest page among all of them. In other words, we keep a time stamp for each page when it was brought. We then choose the page with least time stamp and replace it.

Intuitively, we can say that this method is not useful at all times. There's no big problem if the oldest page is not referenced a lot in the future. But what if that's going to be heavily referenced page?

Moreover, it suffers from Belady's Anamoly, which is as follows: Normally, we may think that when we increase the number of frames for a process, the page fault decreases. But, that's not true!! It may increase also for some weird sequence of page faults.

optimal page replacement:

This is the best-known algorithm, but unfortunately its theatrical. It replaces that page which is least used in future. Of course, we can directly conclude this by intuition that its the best. But we can't implement this because one cannot know by any means which page is gonna be least used in future.

Least Recently Used Algorithm:

In this algorithm, instead of thinking about future we think about past. In other words, that page is going to be replaced which was used long back in the past. Let us take an illustrative example:

Initially 7,0,1 causes page-fault because they are the initial pages entering into memory. Then when 2 comes, we can see that 7 was used long time back and so we replace it . Then, 0 doesn't cause any page-fault. When 3 enters, the one used long time back is 1, so replace it. This process continues.

Good thing is that this algorithm doesn't suffer from Belady's Anamoly. We have to know how to physically implement this.

We can keep a counter for each frame. Initially all are zero. When a page fault occurs we increment the counter values for all the frames by 1. The victim frame is the one with largest count. When we replace the frame, we initialize the counter again to zero. But there are any issues with this approach.

We can use a stack on the other hand. It is as follows. Initially the stack is empty. We then fill it with page numbers whose page fault has occured. If the page number is already in the stack, just bring it to the top, else push that new page number. In this way, the page used long time ago will be at the bottom of the stack. We simply get this number and replace the corresponding one. Sounds good, right?

There are still other algorithms which are improvements to LRU known as LRU-Approximation algorithms.

Additional-Reference-Bits Algorithms:

There will be an 8-bit byte associated with each page. At regular intervals, we shift this byte by one bit to right and replace the MSB with the reference bit (Reference bit is set if that was referenced else cleared). If we see these bytes of the pages we can get a lot of information.

If the byte is say 11111111, then it is heavily used one. If these are zeros instead, it is least used. If we interpret this as a number, then the one with least value is our victim page.

Second-Chance Algorithm:

In this method, we check the reference bit, if it is clear, we simply replace it. If it was set, we clear it and simply proceed to next one. In this way, we are giving a second chance to the page and this is advantageous, if that page is again used.

Enhanced Second-Chance Algorithm:

This examines both the reference bit and modify bit. We have four possibilities. (0,0)...not used & not modified, (best choice to replace). (0,1)...not used but modified,(second choice). (1,0)...used & not modified, (third choice). (1,1)...used & modified,(last choice).


Allocation of Frames:

We can't allocate very less frames or large number of frames. Both cause increased page-fault rate and wastage of memory respectively. Equal Allocation is not such a good idea, because it wastes the memory if the process is small and increases the page-fault if the process is large. Another way to allocate is Proportional Allocation.

We allocate frames to each process based on the total size of the process. This method doesn't distinguish between the priority of the processes. If we want to add this priority feature, we use Global & Local Allocation.

Global Allocation means a process can replace frames from other processes when a page-fault occurs. Local Allocation means a process can its own frames only and not the others.

If we allow higher priority process to use Global Allocation and lower priority process to use local allocation then the problem is solved! But unfortunately, this has a severe issue called Thrashing.


Wednesday, 10 October 2012

Pseudo-Code for thread library

In my previous post, I was explaining about the method of creating the thread library using context switch. In this post, I am gonna give the pseudo-code and then discuss certain issues at last.

I hope this post will give you the clear picture of the thread library :) . I don't say that this is the only way, but its one of the ways of implementing it.

The code:


//create a global queue which stores all the list of created threads...
//and create "Current_thread" a pointer to currently executing thread or context...

void initialize()    //called at the starting of main function...
{
   //Includes all the code like initializing the timer and attaching the signal
// handler function "schedule()" to the signal SIGPROF. 
     
}

void schedule()      //signal handler for SIGPROF
{

 /*swap the context with the thread next to the "Current_thread"
 in the queue.and
point the "Current_thread" to the next thread...

If it was the last thread in the queue, then execute the main function
 once and then again start from the first thread in the queue and continue
 this until all threads have finished their execution
 
*/

/*when there are no more threads in the queue... just stop the 
timer and execute the main function until it completes...
*/

}

void create_thread(thread*, function*, any_argumnets_to_be_passed)
{
   /*Initialize the context using getcontext(), attach a function
 using makecontext() and keep this in the ready_queue*/

//If this was the first thread that is created, then start the timer....

/*important: block all the signals while adding a new thread to the queue...
 and after adding it, just unblock all the signals that were previously 
blocked...
*/
}

void kill_thread()
{

/* when the currently executing thread has finished its execution before
 the context switch, then this function should be executed...
*/

/*the main task is to remove the thread from the ready queue (ofcourse you
 have to block all the signals and then unblock at last..) and at last it
 should call the schedule function. */

}


That's it!! we are done...!!

I was saying in my pseudo-code to block all the signals before adding or removing any thread in the queue. This was the question I posed in my previous post and here is the solution to solve the issue. I said that every process has a signal mask associated with it and it includes all the list of signals that should be blocked. We are gonna change this mask to overcome this issue.

We have to use the following function to perform the operation:

int sigprocmask(int how, const sigset_t *set, sigset_t *oset)

You may not have seen this function before, but for sure you must know the type of 2nd and 3rd arguments as we have discussed in the previous post.

This function is pretty much easy to use. You specify the signal that you want to block in the set variable using the functions that i have discussed previously. Then call this function and it blocks the desired signal. I have to say something about the first argument.

It specifies what the function has to perform exactly. It has three types of values.

SIG_BLOCK makes the function to block the additional signals specified in the set variable along with the previous ones.

SIG_SETMASK will set the mask directly with the fresh values.

SIG_UNBLOCK makes the function to unblock all the signals specified in the set variable.

If the third argument is not null, it returns the previously used mask. In this case the variable old has the previous mask.

It will be clear if i take an example. Here is the sample code to give you a clear idea about the usage of the function.


sigset_t a,b;
sigemptyset(&a);
sigaddset(&a, SIGPROF);   // i want to block SIGPROF...
sigprocmask(SIG_BLOCK, &a, &b); //SIGPROF blocked...
//*******

//perform all the operaitons related to queue here....

//*******
//at last unblock the SIGPROF signal....
sigprocmask(SIG_SETMASK, &b, NULL);
//In the above function call we are simply restoring the previous contents of signal mask... hence SIGPROF gets unblocked...

I hope much of it is clear from the comments beside it. :)


Now, I have said that the function kill_thread() must be executed when a thread gets terminated before its time slot has finished. So, how do you accomplish this. Here comes the uc_pointer to rescue. We just make every threads context's uc_pointer to point to some context which executes the kill_thread in which we remove Current_thread from the queue and then executing the schedule function. This accomplishes the task without any errors.

I hope you will get the desired results without too many segmentation errors. ;).


Tuesday, 2 October 2012

Creating your own Thread Library using Context Switching

As we have known a lot about the pthread library and System V IPC Semaphores, let us try to create our own thread library. This can be accomplished in mainly three ways, and you can learn it from here.

Out of the three methods, I use Context Switching, because it mainly gives the control over the scheduling the threads. We require to explore several new functions to create a thread Library. One more thing is that I prefer to use Round-Robin Scheduling algorithm to schedule the threads. I prefer to call our threads as Tasks (of course, this is not the name which I discovered, but its from ADA :P)rather than the former one.

Getting Started:

So, let us start from scratch. Basically, we allow the user to create as many number of tasks as he wants, and we then try to schedule these tasks.

When we look from the system point of view, we run a single program, in which we change the context for every fixed amount of time interval. The kernel sees it as a single thread of execution. All the necessary scheduling has to be done by us.

You may wonder whats with this context switching? Let me explain, every process on your system has a context, which includes all the necessary entities used for running a process.These include the process stack, Program counter, signal mask and so on.

The process stack refers to the stack used by the process during its execution, like when the process calls another function all the contents of the calling function are pushed on to this stack. This is especially pretty much useful in case of recursive functions. But don't try to find 1000th fibonacci number using a recursive function. Because this involves large number of recursive calls which makes the process run out of the stack memory and the program terminates abnormally.

The program counter as you know, stores the address of the next instruction to be executed. This is useful to resume the process in case it was either sleeping or waiting for I/O to occur. We don't explicitly initialize the value, because the system will automatically does that.

Every process has a signal mask with it. It is used to store the information about those signals it will block. More about this, we will see sooner.

Datatype for Process context:

In C, we have the datatype ucontext_t, for controlling the process context and it is actually a struct defined in the header file ucontext.h. The struct is defined as follows:


struct ucontext_t
{
    ucontext_t *uc_link;
    sigset_t uc_sigmask;    
    stack_t uc_stack;         
    mcontext_t uc_mcontext;
};

Although, you may get confused for now about the description of these datatypes, you will get a clear picture when we look at a sample program

The first one *uc_link is a pointer to another context. When the function associated with the current context completes its execution, the control transfers to the context pointed to by uc_link pointer. If you want nothing to happen after the completion of current context, then simply keep it as NULL.

The second one as I said previously is the signal mask associated with the context. You will come to know more about the datatype sigset_t sooner.

The stack_t is another struct which is defined as follows:


 typedef struct {
               void  *ss_sp;     
               int    ss_flags;  
               size_t ss_size;   
           } stack_t;

The first of these is a generic void pointer ss_sp, which points to the stack memory. We dynamically allocate some finite amount of memory for this pointer.

The second one ss_flags, is the flags associated with the stack, which is out of bounds in our discussion and so we simply initialize it to zero.

The third one is an integer type that stores the amount of memory that you allocated previously to the void pointer. Both values should match else your program may go wrong.

Coming back to our ucontext_t, the last datatype is the one which stores the registers values in it. We should be least concerned about it, because it is generally used by the system.


Now that we have learnt about the ucontext_t datatype, let us examine the basic functions used on this datatype. They are as follows:

int getcontext(ucontext_t *ucp); 

This function gets the context of the currently executing context. In other words, it takes a pointer to ucontext_t and stores all the register values, instruction pointer value which are there when this call is executed. This is either used while initializing a context or to explicitly get the context. It returns 0 on success and -1 on failure.

 int setcontext(const ucontext_t *ucp);

This is used to set the context of the calling process to the context pointed to by ucp. In other words, the control transfers to the state pointed to by ucp which was previously initialized by getcontext.

Let us take some analogy to avoid confusion. Suppose that you are watching a movie and there is an awesome action scene at say 22 minutes and 0 seconds from the beginning and you remembered the time. You completed watching the movie and started watching another one. Now, in the middle of this movie you thought of watching that action scene in the previous movie. So, you stopped playing the current one and started the previous one and forwarded to exactly 22 minutes and started watching it, till the end.

In the above analogy, every movie is like a context, you are watching the movie means you are executing the context. When you remember the time 22:00, it is like calling a getcontext function which stores the value 22 in say 't'. You completed watching the movie meaning the current context finished its execution. You started watching the 2nd one and would like to watch that action scene. This stopping of the 2nd movie and going to 22 minutes of the first one is done by setcontext, which takes the argument as 't'. The 2nd context execution has ended and first one is restored.

I hope you have got a clear idea about these functions. :)

I have been saying 'function associated with a context'. This means that every context should have a function to execute, otherwise there is no purpose of using a context! This attaching a function to a context can be done by using the following function call:

void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);

The arguments are the pointer to the context to which you want to attach a function, the pointer to the function, the third argument is the number of arguments you want to pass to the function. If you specify 'n', then 'n' arguments should follow after this number.

So, suppose you want to set the context for currently executing context to some other context, but at the same time you don't want to loose the the current executing context, that is you want to store it in another context variable. This can be done simply by calling getcontext() followed by setcontext() or there is another way which is equivalent to above operation. It is by calling the following function:

int swapcontext(ucontext_t *restrict oucp, const ucontext_t *restrict ucp);

When the above function is called it stores the current context into oucp and transfers the control to the context pointed to by ucp. Thus, it avoids the overhead of calling two functions.


Now that we have known about all these functions, let us try to write a small program which demonstrates the usage of the above functions. The program is as follows:


#include<stdio.h>
#include<stdlib.h>
#include<ucontext.h>
#define MEM 64000
ucontext_t T1, T2,Main;
ucontext_t a;
int fn1()
{
 printf("this is from 1\n");
 setcontext(&Main);
}
void fn2()
{
 printf("this is from 2\n");
  setcontext( &a);
 printf("finished 1\n");
}
void start()
{
 getcontext(&a);
 a.uc_link=0;
 a.uc_stack.ss_sp=malloc(MEM);
 a.uc_stack.ss_size=MEM;
 a.uc_stack.ss_flags=0;
 makecontext(&a, (void*)&fn1, 0);
}

int main(int argc, char *argv[])
{
 start();
 getcontext(&Main);
 getcontext(&T1);
 T1.uc_link=0;
 T1.uc_stack.ss_sp=malloc(MEM);
 T1.uc_stack.ss_size=MEM;
 T1.uc_stack.ss_flags=0;
 makecontext(&T1, (void*)&fn1, 0);
 swapcontext(&Main, &T1);
 getcontext(&T2);
 T2.uc_link=0;
 T2.uc_stack.ss_sp=malloc(MEM);
 T2.uc_stack.ss_size=MEM;
 T2.uc_stack.ss_flags=0;
 makecontext(&T2, (void*)&fn2, 0);
 swapcontext(&Main, &T2);
 printf("completed\n");
 exit(0);
}

Although the code looks messy, it will be clear if we can trace out the exact execution path. Start with the main(). Here, we are calling the function start(). It initializes the context of a and attaches the function fn1 to it, by making the call to makecontext. Take a careful look at the allocation of memory to the stack and other assignments.

The control transfers back to main(), where we are initializing the context variable T1.Here we are attaching the function fn1 to T1. Then we are calling swapcontext, in which we are storing the current context of the main function in Main and transferring the control to fn1.

The function fn1 starts executing and displays the string in it. Then we are setting the context to Main again. Now, the previous state of the main function gets restored and execution of main function starts from that point, which is after swapcontext function.

Here, again we initialize the context of T2 and attach the function fn2 to it. We then swap the context with Main again. The function fn2 starts executing and displays the string inside it. The second statement makes the current context set to the context variable a and hence the function fn1 which was attached previously gets executed, which in turn calls the main function context and this completes its execution by printing 'completed'.

One thing worth noting here is that the statement 'finished 1' is not printed at all. The reason is that the set context is called which transfers the control to the function fn1 and thus not executing this statement.

Hence, the output of the program is as follows:

this is from 1
this is from 2
this is from 1
completed

Now that we are clear about these contexts and their switchings let us move on to the next phase. Since we are using Round-Robin scheduling, there must be a function that should execute after every time interval. This function should do all the necessary context switching and restoring. But how to make this happen? You may think of a busy while loop... But remember here we are not executing multiple threads like a pthread library, but only single thread of execution in which we change and restore the contexts continuously mimicing execution of several threads.

Signals come handy in these situations. Now, this is again a new thing which we are gonna discuss. Although, you have unknowingly sent many signals to many processes by using kill command, or pressing Ctrl + C which sends Interrupt signal SIGINT to a process, which merely stops the process.

Delving into signals:

So, we can send any process any signal (of course, we should have the necessary permissions for doing that). Every process has a signal handler associated with it for each signal. Signal handlers are just like normal functions but are executed only when the process receives a signal that corresponds to the function.

Although most of them are default, we can give our own functions to execute for each signal. The default job of a process when it receives SIGTERM is to terminate. Now, let us try to change this behaviour by writing a program as follows:


#include<stdio.h>
#include<stdlib.h>
#include<signal.h>
#include<sys/types.h>
void print(int y)
{
 printf("you can't kill me :D\n");
}
int main(int argc, char *argv[])
{
 
 struct sigaction act, oact;
 act.sa_handler = print;
 sigemptyset(&act.sa_mask);
 act.sa_flags = 0;
 sigaction(SIGINT, &act, NULL);
 while(1);
 printf("yea2");
 exit(0);
}

There are some new terms in this program, but let us see the interesting part first. When you execute this program, it does nothing but keeps on executing an infinite while loop. Now, you try to Ctrl+C it. By default, it should terminate. But that doesn't happen here. We have created a signal handler for it, which is print.

so, when you press Ctrl +C , you end up displaying the same message 'you can't kill me :D'. So, now you may think that you creted a simple virus program which simply wastes the CPU cycles. But that's not true. Not all signals can be assigned a signal handler. One of them is SIGKILL.

To kill this process, either you can close the terminal or simply open a new one and issue the following command:

kill -9 process-id

This command directly kills the process.

Now, coming to the code of the program. We see a new datatype called sigaction. The structure of this datatype is as follows:

struct sigaction {
               void     (*sa_handler)(int);
               void     (*sa_sigaction)(int, siginfo_t *, void *);
               sigset_t   sa_mask;
               int        sa_flags;
               void     (*sa_restorer)(void);
           };

As you compare the program with this structure, we see that sa_handler is a function pointer that points to the actual signal handler. The second one is not needed for now. It is like an alternate signal handler. The third argument is worth explaining now. It is of type sigset_t and it is internally definition depends on the hardware implementation.

The sa_mask stores the information about the signals whether they are blocked or not. Different signals can be added or removed by using the following functions:


int sigemptyset (sigset_t *set);
int sigfillset (sigset_t *set);
int sigaddset (sigset_t *set, int signum);  
int sigdelset (sigset_t *set, int signum);
int sigismember (const sigset_t *set, int signum);

The usage of these functions is simple. The first one excludes all the signals from blocking. The second one is opposite to the previous one. The third one adds a signal specified by signum to the set. Fourth one removes the signal. Fifth one checks if the signal specified by signum is already blocked or not.

If the sa_mask is filled with appropriate signals, then they will get blocked when the signal handler is executing. In the above program we didn't block any of the signals. But, for instance if you want to block signal number 11, then you need to add this to the sa_mask, which will give you the desired result. Also note that the signal which triggered this handler is implicitly blocked when the handler is executing.

Lastly, the sigaction function takes three arguments. The first is the signal name for which you want to create a signal handler, second is the pointer to a sigaction datatype, third is also the same but, when you give a pointer to sigaction rather than NULL, then the variable is filled with the previous sigaction variable. For now, you can keep it as NULL.

The sig_num that I was saying can be referred from here.


Now we have a function that executes on receiving a particular signal. But we don't want to manually give a signal to the process all the time, but instead we want that to be done automatically as well as periodically.

Here comes the itimerval into the picture. First let us see a small program using this, to see what actually happens with this itimerval.

 

#include <stdio.h>
#include <signal.h>
#include <sys/time.h>

int count = 0;
void sighandler(int sig)
{
 
 printf("signal occurred %d times\n",signo, ++count);
}
 
int main(void)
{
 struct itimerval it;
 struct sigaction act, oact;
 act.sa_handler = sighandler;
 sigemptyset(&act.sa_mask);
 act.sa_flags = 0;

 sigaction(SIGPROF, &act, &oact); 
 // Start itimer
 it.it_interval.tv_sec = 4;
 it.it_interval.tv_usec = 50000;
 it.it_value.tv_sec = 1;
 it.it_value.tv_usec = 100000;
 setitimer(ITIMER_PROF, &it, NULL);
 for ( ; ; ) ;
}

If you try to execute the above code, you will see the string 'signal occured .. tiimes' for the first time after (1 + 100000*10-6) or 1.1 seconds. Then from the second time you will see the string after every (4 + 50000*10-6) or 4.05 seconds and it goes on. I think by now you may understand what these numbers mean. So, I don't want to further go deep into the struct itimerval, but recommend you to see the man pages.

Now, observing the code it is clear that itimerval is a datatype for starting an alarm. The signal handler responds for SIGPROF signal, which is generated only if the setitimer() is set for ITIMER_PROF.

In other words, when you want to count the time, there may be different cases. It may be either the direct time, which can be done by giving first argument as ITIMER_REAL and the signal handler should be changed SIGALRM signal.

We may only want to count the time for which the process executed in the user mode. This can be done by passing ITIMER_VIRTUAL and the signal this generates is SIGVTALRM.

You may also want to count the amount of time a process spent in both user mode and kernel mode. Then you need to change the arguments as ITIMER_PROF and signal it generates SIFPROF which is shown in the above program.

Note that the difference between first and third types is that first type counts the time the process spent in ready queue and all other extra times, while the third type doesn't consider these times.


The main part:

We are ready with every function necessary to build a thread library. So, let us discuss the basic idea of building it. So, we define an ready_queue, which stores all the pointers to the contexts of live threads like a linked list, task_count, that counts the number of alive threads, Initializing function, that initializes the signal handler for SIGPROF (here, I use third way of counting the time), set all the values for the alarm.

A task_create function that creates a context and attaches it with a function that was passed as argument to task_create and adds it into the ready_queue.

When the first task is created, the timer is set to start, and the signal handler function will be schedule which chooses one context from the ready_queue and it either sets or swaps with the current context.

There should also be a separate variable for storing the context of the main function, because we should note that the main function will also be scheduled just like the other tasks.


There is one more issue that we forgot to deal which is as follows: The user created a new task and the create_task function was executing, in which the addition of the new task's context into the queue is going on. Now, imagine what happens when a SIGPROF is received? The control straight away transfers to the schedule function, but the queue was in middle of the pointer operations, and so this many lead to segmentation error.

Guess what should be done to overcome this problem? The answer will be updated after some time. For now, you think of implementing the library and discover your own solution to overcome the above issue. I have got these SIGSEGVs for about hundred times until I finally got the correct result.

Although, I was giving a quick glance over all the functions, I hope this will suffice. But, if you have any doubts still crept in, please do comment on this post :).