I've been reading through the horror that is the PCIe spec, and still can't get any kind of resolution to the following question pair.
Does PCIe allow for mapping huge (say 16GB) 64-bit non-prefetchable memory spaces up above the 4GB boundary? Or are they still bound to the same 1GB that they were in the 32-bit days, and there's just no way to call for giant swaths of non-prefetchable space?
Assuming that the spec allows for it (and to my reading it does), do widely available BIOSes support it? Or is it allowed in theory but not done in practice?
No. BAR requests for non-prefetchable memory are limited to using the low 32-bit address space.
http://www.pcisig.com/reflector/msg03550.html
The reason why the answer is no has to do with PCI internals. The data structure which describes the memory ranges that a PCI bus encompasses only reserves enough space to store 32-bit base and limit addresses for non-prefetchable memory and for I/O memory ranges. However, it does reserve enough space to store a 64-bit base and limit for prefetchable memory.
Specifically, look at http://wiki.osdev.org/PCI#PCI_Device_Structure, Figure 3 (PCI-to-PCI bridge). This shows a PCI Configuration Space Header Type 0x01 (the header format for a PCI-to-PCI bridge). Notice that starting at register 1C in that table, there are:
The actual addresses are created by concatenating (parts of) these registers together with either 0s (for base addresses) or 1's (for limit addresses). The I/O and non-prefetchable base and limit addresses are 32-bits and formed thus:
Bit# 31 20 19 16 15 0
I/O Base: [ 16 upper bits : 4 middle bits : 12 zeros ]
I/O Limit: [ 16 upper bits : 4 middle bits : 12 ones ]
Non-prefetchable Base: [ 12 bits : 20 zeros ]
Non-prefetchable Limit: [ 12 bits : 20 ones ]
The prefetchable base and limit addresses are 64-bit and formed thus:
Prefetchable Base:
Bit# 63 32
[ 32 upper bits ]
[ 12 middle bits : 20 zeros ]
Bit# 31 16 15 0
Prefetchable Limit:
Bit# 63 32
[ 32 upper bits ]
[ 12 middle bits : 20 ones ]
Bit# 31 16 15 0
As you can see, only the prefetchable memory base and limit registers are given enough bits to express a 64-bit address. All the other ones are limited to only 32.