I Googled for a long time but I still don't understand how it works as most of the explanation are very technical and there are no illustrations to make it clearer. My primary confusion is that what is its'difference with virtual memory?
I hope this question will have a very good explanation here so that other people who ask the same question can find it here when they Google it.
I have to admit, those two concepts can seem quite complicated and similar at the beginning. Sometimes they are also taught confusingly. A good reference in my opinion can be found on osdev.org: Segmentation Paging
For sake of completion, I'll try to explain it here too, but I cannot guarantee correctness, as I have not developed OS for some months.
Segmentation is the older of both concepts and it is in my opinion the more confusing. Segmentation works on - as the name says - segments. A segment is a continuous block of memory of a specific size. To access memory within each segment we need an offset. This makes a total of two address components, which are in fact stored in two registers. One idea of segmentation was to enlarge memory having only 16-bit registers. The other was some sort of protection, but not as elaborate as that one of paging.
Because we use two registers to access memory now, we can split memory into chunks - as said above, the so called segments. Consider a memory of 1MB (2^20). This can be split into 65536 (2^16, because 16 bits registers) segments of each 16 bytes. Of course, we also have 16 bits registers for the offset. Addressing 16 bytes with 16 bits is quite useless, so it was decided that segments can overlap (which I think also had performance and programming reasons back then).
The following formula is used to access 1MB of memory with segmentation:
Physical address = (A * 0x10) + B
This means the segment will be 16 times the offset. This also means that the address 0x0100 can be accessed in many ways, e.g. by A=0x010 and B=0x0, but also by A=0x0 and B=0x0100.
This was segmentation in the old 16bit days.
If you look at assembler programs or try something yourself, you'll see they even have so called registers in assembler: CS and DS (code segment and data segment).
Later a so called Global Descriptor Table (GDT) was introduced. This is a global table (at a specific position in your RAM) in which segment numbers and memory addresses and several other options for each segment are given. This brings us nearer to the concept of paging, but it's still not the same.
So now the programmer himself can decide where segments should start. A new concept also was that in the GDT one could decide how long a segment should be. So not each segment had to be 64kB long (2^16, because of 16 bit registers), but the limit could be defined by the programmer. You could have overlapping segments or also purely separated segments.
When accessing A:B now (still two registers used for accessing memory), A will be the entry in the GDT. So we'll look up the A'th entry in the GDT and see at which memory location the segment starts and how large it is. We then check if B (offset) is within the allowed memory area.
Now paging is not so different from the newer segmentation approach, but at paging each page has a fixed size. So the limit is no longer programmable, each page has (currently) 4kb. Furthermore, unlike at segmentation, the logical address space can be continuous without the physical addresses being continuous.
Paging also uses tables to look up stuff and you still split the logical address into parts. The first part is the number of the entry in the page table, the second part is the offset. However, now the offset has a fixed length of 12 bits to access 4kb. You can also have more than two parts, then multiple page tables will be used. Two level page tables are quite common, for 64bit systems I think even three level page tables are common.
I hope I was able to explain it at least a bit, but I think my exaplanation was also confusing. Best thing is to dive into kernel programming and try to implement the most basic stuff when booting an OS. Then you'll find out everything, because due to backwards compability everything is still on our modern PCs.