Pardon me if you feel this has been answered numerous times, but I need answers to the following queries!
Why data has to be aligned (on 2-byte / 4-byte / 8-byte boundaries)? Here my doubt is when the CPU has address lines Ax Ax-1 Ax-2 ... A2 A1 A0 then it is quite possible to address the memory locations sequentially. So why there is the need to align the data at specific boundaries?
How to find the alignment requirements when I am compiling my code and generating the executable?
If for e.g the data alignment is 4-byte boundary, does that mean each consecutive byte is located at modulo 4 offsets? My doubt is if data is 4-byte aligned does that mean that if a byte is at 1004 then the next byte is at 1008 (or at 1005)?
CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don't-care bits.
Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as internally the CPU performs two memory accesses and a math operation to load one word. If you are doing byte reads, no alignment applies.
Some CPUs (ARM, or Intel SSE instructions) require aligned memory and have undefined operation when doing unaligned accesses (or throw an exception). They save significant silicon space by not implementing the much more complicated load/store subsystem.
Alignment depends on the CPU word size (16, 32, 64bit) or in the case of SSE the SSE register size (128 bits).
For your last question, if you are loading a single data byte at a time there is no alignment restriction on most CPUs (some DSPs don't have byte level instructions, but its likely you won't run into one).