A file that is given as input to the linker is called Object File. The linker produces an Image file, which in turn is used as input by the loader.
A blurb from "Microsoft Portable Executable and Common Object File Format Specification"
RVA (relative virtual address). In an image file, the address of an item after it is loaded into memory, with the base address of the image file subtracted from it. The RVA of an item almost always differs from its position within the file on disk (file pointer).
In an object file, an RVA is less meaningful because memory locations are not assigned. In this case, an RVA would be an address within a section (described later in this table), to which a relocation is later applied during linking. For simplicity, a compiler should just set the first RVA in each section to zero.
VA (virtual address). Same as RVA, except that the base address of the image file is not subtracted. The address is called a “VA” because Windows creates a distinct VA space for each process, independent of physical memory. For almost all purposes, a VA should be considered just an address. A VA is not as predictable as an RVA because the loader might not load the image at its preferred location.
Even after reading this, I still don't get it. I've lot of questions. Can any one explain it in a practical way. Please stick to terminology of Object File
& Image File
as stated.
All I know about addresses, is that
.data
& .text
(for function names).If there is some thing wrong in what I know, please correct me.
EDIT:
After reading answer given Francis, I'm clear about whats Physical Address, VA & RVA and what are the relation between them.
RVAs of all variables&methods must be computed by the Linker during relocation. So, (the value of RVA of a method/variable) == (its offset from the beginning of the file)? must been true. But surprisingly, its not. Why so?
I checked this by using PEView on c:\WINDOWS\system32\kernel32.dll
and found that:
.text
is the first section in this dll). .text
through .data
,.rsrc
till the last byte of last section (.reloc
) RVA & FileOffset are different. & also the RVA of first byte of the first section is "always" being shown as 0x1000
My Guess:
All, the bytes of data that were
before the first (.text
here)
section are "not" actually loaded
into VA space of the process, these
bytes of data are just used to
locate & describe these sections.
They can be called, "meta section
data".
Since they are not loaded into VA
space of process. the usage of the
term RVA is also meaningless this is
the reason why RVA == FileOffset
for these bytes.
Since,
.text
, .data
, .rsrc
, .reloc
are such bytes.0x00000
PEView software is starting
it from 0x1000
.I cannot understand why the 3rd observation. I cannot explain.
Most Windows process (*.exe) are loaded in (user mode) memory address 0x00400000, that's what we call the "virtual address" (VA) - because they are visible only to each process, and will be converted to different physical addresses by the OS (visible by the kernel / driver layer).
For example, a possible physical memory address (visible by the CPU):
0x00300000 on physical memory has process A's main
0x00500000 on physical memory has process B's main
And the OS may have a mapping table:
process A's 0x00400000 (VA) = physical address 0x00300000
process B's 0x00400000 (VA) = physical address 0x00500000
Then when you try to read 0x004000000 in process A, you'll get the content which is located on 0x00300000 of physical memory.
Regarding RVA, it's simply designed to ease relocation. When loading relocable modules (eg, DLL) the system will try to slide it through process memory space. So in file layout it puts a "relative" address to help calculation.
For example, a DLL C may have this address:
RVA 0x00001000 DLL C's main entry
When being loaded into process A at base address 0x10000000, C's main entry become
VA = 0x10000000 + 0x00001000 = 0x10001000
(if process A's VA 0x10000000 mapped to physical address was 0x30000000, then
C's main entry will be 0x30001000 for physical address).
When being loaded into process B at base address 0x32000000, C's main entry become
VA = 0x32000000 + 0x00001000 = 0x32001000
(if process B's VA 0x32000000 mapped to physical address was 0x50000000, then
C's main entry will be 0x50001000 for physical address).
Usually the RVA in image files is relative to process base address when being loaded into memory, but some RVA may be relative to the "section" starting address in image or object files (you have to check the PE format spec for detail). No matter which, RVA is relative to "some" base VA.
To summarize,
(edit) regarding claw's new question:
The value of RVA of a method/variable is NOT always its offset from the beginning of the file. They are usually relative to some VA, which may be a default loading base address or section base VA - that's why I say you must check the PE format spec for detail.
Your tool, PEView is trying to display every byte's RVA to load base address. Since the sections start at different base, RVA may become different when crossing sections.
Regarding your guesses, they are very close to the correct answers:
Usually we won't discuss the "RVA" before sections, but the PE header will still be loaded until the end of section headers. Gap between section header and section body (if any) won't be loaded. You can examine that by debuggers. Moreoever, when there's some gap between sections, they may be not loaded.
As I said, RVA is simply "relative to some VA", no matter what VA it is (although when talking about PE, the VA usually refers to the load base address). When you read thet PE format spec you may find some "RVA" which is relative to some special address like resource starting address. The PEView list RVA from 0x1000 is because that section starts at 0x1000. Why 0x1000? Because the linker left 0x1000 bytes for PE header, so the RVA starts at 0x1000.
What you've missed is the concept of "section" in PE loading stage. The PE may contain several "sections", each section maps to a new starting VA address. For example, this is dumped from win7 kernel32.dll:
# Name VirtSize RVA PhysSize Offset
1 .text 000C44C1 00001000 000C4600 00000800
2 .data 00000FEC 000C6000 00000E00 000C4E00
3 .rsrc 00000520 000C7000 00000600 000C5C00
4 .reloc 0000B098 000C8000 0000B200 000C6200
There is an invisible "0 header RVA=0000, SIZE=1000" which forced .text to start at RVA 1000. The sections should be continuous when being loaded into memory (i.e., VA) so their RVA is continuous. However since the memory is allocated by pages, it'll be multiple of page size (4096=0x1000 bytes). That's why #2 section starts at 1000 + C5000 = C6000 (C5000 comes from C44C1).
In order to provide memory mapping, these sections must still be aligned by some size (file alignment size - decide by linker. In my example above it's 0x200=512 bytes), which controls the PhysSize field. Offset means "offset to physical PE file beginning".
So the headers occupy 0x800 bytes of file (and 0x1000 when being mapped to memory), which is the offset of section #1. Then by aligning its data (c44c1 bytes), we get physsize C4600. C4600+800 = C4E00, which is exactly the offset of second section.
OK, this is related to whole PE loading stuff so it may be a little hard to understand...
(edit) let me make a new simple summary again.