Friday 26 October 2012

Virtual Memory-I

In this post I am going to explain about the concept of virtual memory used in the modern Computer Systems.


Why virtual memory?:

Now, the first important question to pose is why to actually use this concept. So, when you have a large program of size say 2 GB, and you want to run it. So, what a computer without virtual memory management usually does is it starts loading the whole program into the RAM first and then when it is ready with the loading it starts executing it. This is not a big problem if the computer runs single program at-a-time like the first version of MS-DOS systems.

But now-a-days, nobody uses such a computer. Everybody wants to have a computer that has Multi-programming capability. So, when such a big program as said previously wants to be executed, it is tough for the computer to execute multiple programs, since the available RAM memory is mostly occupied by the big program.

So, how to overcome this problem? This post explains what to do for that and also the additional benefits of Virtual Memory Technique.


Diving into the topic:

So, you may doubt that why we keep on calling it "virtual memory"? Does this refer to a non-real memory? The answer is yes! It is because the user who writes the program thinks of having a large amount of memory, for instance you can think that the amount of memory that can be used by your program can be as large as 10 GB, even though your RAM is far smaller.

In simple words we are bluffed by the computer in the way we think of the memory. Suppose, you have a RAM of 2 GB. Now, you write a program that creates an array of size 3 GB. Even though you try to access the last byte, this actually gets mapped somewhere in the RAM itself!! Now, don't think some weird thing is happening inside. It is because of the virtual memory!

There may be some programs which have functions to handle exceptions, intuitively we can say that these are rarely executed. Then what is the need of bringing these routines into the main memory? We additionally are wasting the memory.

In order to prevent this, what the computer does is, it simply brings only those parts of the program which will be executed first and further it brings the parts which contain routines only when they are called/referenced.

Here, I was saying the "part" of the program. But formally they are called "pages". There is nothing new with this word. It is the technical way of saying "the whole program is divided into some fixed size chunks which usually have a size in the powers of 2." So, from now on, I refer to these 'parts' as 'pages'.

Now, when we try to execute the program by keeping it partially in the main memory, there are several advantages like:

1. The main memory is saved a lot and this can be used to execute some other programs.

2. Intuitively, there is no need to worry about the available physical memory.

3. The time required to swap out a program is very less, because we have only the small portion of code in the memory. If you are not familiar with "swapping", it is nothing but the temporary suspension of a program in which it is copied from the RAM to some location on the hard disk. This is usually done when the program waits for I/O or some other operation to happen. The available RAM space can be used to execute some other program.


Implementing Virtual memory:

One way of implementing the virtual memory is by Demand Paging. It is very simple and is as follows:

"Bring the pages of a program into the RAM, only when they are referenced"

To make it clear, let me take an example. Suppose that I have written a C program which has 4 games in it. At the beginning, I prompt the user to choose one of them. In this case, it is useless to bring the whole program into the RAM. Instead, we can do this way. First bring the page containing the main function, then allow the user to choose the game, then bring the page with the chosen game into the memory. This method is called demand paging. Here we can see the amount of RAM saved!

Now, let us see how this method is physically implemented. For every process, there will be a page table associated with it. A page table is just like a look-up table in which if we have the page-number for a page we can find its corresponding location on the RAM. This is technically called Frame. A Frame is just like a Page except that the former one refers to RAM while the later one refers to a program. So, we say that the pages of different programs are allocated to different Frames on the RAM.

Since we are not bringing all the pages into the RAM, the entries actually having a page should be distinguished from those entries which doesn't. For this purpose, there is one more bit called valid/invalid bit associated with each page, in which if the bit is set to valid, then there is a page in the RAM at the specified address, or else there isn't.

Page-Fault:

Suppose, that a page is referencing another page which is not currently on RAM or in other words whose valid bit is not set, then this causes operating system to trap. This is called Page-Fault. It temporarily suspends the process, brings in the desired page (provided it is a valid memory reference or else the program terminates saying Segmentation error) into one of the frames, sets the valid bit and fills the entry with corresponding address and resumes the execution of process.

Fork():

Let us see the action of fork() function and its relation with demand paging. When we call fork(), it creates a new process which has the exact address space as that of parent process. To accomplish this internally, one might think to duplicate all the pages of the parent and give them to child. This method has a lot of overhead and makes the process of creating a new process very slow.

To overcome this problem, we use a technique called copy-on-write in which new process is created on the fly. In this method, we bypass the demand paging. It is as follows. Initially, all the pages of the parent and child are shared among them. When either of the process tries to modify one of the pages, the modified page is duplicated between them. This reduces the overhead to create a process.

vfork() is similar to fork(), but initially it doesn't share any pages between the parent and child, i.e., both refer to same pages. Whenever , the child calls exec(), it is given a new set of pages with the specified program in it. This method still increases the speed of execution. Typically, this system call is used when the child process calls exec() immediately.


Page Replacement:

We are allocating a new Frame for the process each time a page fault occurs. If we keep on doing that, then consequently the memory occupied by the process increases, and thus it will be hard for bringing a new process and execute since the RAM is being used up. So, the idea is to allocate fixed number of frames for each process and if the page fault occurs after all the frames are used up, then we try to replace one of the pages and allocate it with the new one.

This can be accomplished using different techniques which are discussed below:

Basic Page Replacement:

This is quite simple. First, when page-fault occurs if there is a free frame use it. Else choose one of the frames to replace(normally called victim frame) by some procedure and copy its contents to swap space and write the desired page into that free frame.

This method takes two disk accesses, one to write the victim frame and other to read the referenced frame. This can be further reduced by having a modify/dirty bit. This is set, if the victim frame is modified else not. If it is not set, we simply overwrite the victim page else we need to perform the same as said previously.

Let us see the different procedures of selecting the victim frame.

FIFO Page Replacement:

In this method, we try to replace the oldest page among all of them. In other words, we keep a time stamp for each page when it was brought. We then choose the page with least time stamp and replace it.

Intuitively, we can say that this method is not useful at all times. There's no big problem if the oldest page is not referenced a lot in the future. But what if that's going to be heavily referenced page?

Moreover, it suffers from Belady's Anamoly, which is as follows: Normally, we may think that when we increase the number of frames for a process, the page fault decreases. But, that's not true!! It may increase also for some weird sequence of page faults.

optimal page replacement:

This is the best-known algorithm, but unfortunately its theatrical. It replaces that page which is least used in future. Of course, we can directly conclude this by intuition that its the best. But we can't implement this because one cannot know by any means which page is gonna be least used in future.

Least Recently Used Algorithm:

In this algorithm, instead of thinking about future we think about past. In other words, that page is going to be replaced which was used long back in the past. Let us take an illustrative example:

Initially 7,0,1 causes page-fault because they are the initial pages entering into memory. Then when 2 comes, we can see that 7 was used long time back and so we replace it . Then, 0 doesn't cause any page-fault. When 3 enters, the one used long time back is 1, so replace it. This process continues.

Good thing is that this algorithm doesn't suffer from Belady's Anamoly. We have to know how to physically implement this.

We can keep a counter for each frame. Initially all are zero. When a page fault occurs we increment the counter values for all the frames by 1. The victim frame is the one with largest count. When we replace the frame, we initialize the counter again to zero. But there are any issues with this approach.

We can use a stack on the other hand. It is as follows. Initially the stack is empty. We then fill it with page numbers whose page fault has occured. If the page number is already in the stack, just bring it to the top, else push that new page number. In this way, the page used long time ago will be at the bottom of the stack. We simply get this number and replace the corresponding one. Sounds good, right?

There are still other algorithms which are improvements to LRU known as LRU-Approximation algorithms.

Additional-Reference-Bits Algorithms:

There will be an 8-bit byte associated with each page. At regular intervals, we shift this byte by one bit to right and replace the MSB with the reference bit (Reference bit is set if that was referenced else cleared). If we see these bytes of the pages we can get a lot of information.

If the byte is say 11111111, then it is heavily used one. If these are zeros instead, it is least used. If we interpret this as a number, then the one with least value is our victim page.

Second-Chance Algorithm:

In this method, we check the reference bit, if it is clear, we simply replace it. If it was set, we clear it and simply proceed to next one. In this way, we are giving a second chance to the page and this is advantageous, if that page is again used.

Enhanced Second-Chance Algorithm:

This examines both the reference bit and modify bit. We have four possibilities. (0,0)...not used & not modified, (best choice to replace). (0,1)...not used but modified,(second choice). (1,0)...used & not modified, (third choice). (1,1)...used & modified,(last choice).


Allocation of Frames:

We can't allocate very less frames or large number of frames. Both cause increased page-fault rate and wastage of memory respectively. Equal Allocation is not such a good idea, because it wastes the memory if the process is small and increases the page-fault if the process is large. Another way to allocate is Proportional Allocation.

We allocate frames to each process based on the total size of the process. This method doesn't distinguish between the priority of the processes. If we want to add this priority feature, we use Global & Local Allocation.

Global Allocation means a process can replace frames from other processes when a page-fault occurs. Local Allocation means a process can its own frames only and not the others.

If we allow higher priority process to use Global Allocation and lower priority process to use local allocation then the problem is solved! But unfortunately, this has a severe issue called Thrashing.


No comments:

Post a Comment