They are actually quite different. They serve similar purpose (region queries on spatial data), and they both are trees (and both belong to the family of bounding volume hierarchy indexes), but that is about all they have in common.
- R-Trees are balanced, k-d-trees are not (unless bulk-loaded). This is why R-trees are preferred for changing data, as k-d-trees may need to be rebuilt to re-optimize.
- R-Trees are disk-oriented. They actually organize the data in areas that directly map to the on-disk representation. This makes them more useful in real databases and for out-of-memory operation. k-d-trees are memory oriented and are non-trivial to put into disk pages
- k-d-trees are elegant when bulk-loaded (kudos to SingleNegationElimination for pointing this out), while R-trees are better for changing data (although they do benefit from bulk loading, when used with static data).
- R-Trees do not cover the whole data space. Empty areas may be uncovered. k-d-trees always cover the whole space.
- k-d-trees binary split the data space, R-trees partition the data into rectangles. The binary splits are obviously disjoint; while the rectangles of an R-tree may overlap (which actually is sometimes good, although one tries to minimize overlap)
- k-d-trees are a lot easier to implement in memory, which actually is their key benefit
- R-trees can store rectangles and polygons, k-d-trees only stores point vectors (as overlap is needed for polygons)
- R-trees come with various optimization strategies, different splits, bulk-loaders, insertion and reinsertion strategies etc.
- k-d-trees use the one-dimensional distance to the separating hyperplane as bound; R-trees use the d-dimensional minimum distance to the bounding hyperrectangle for bounding (they can also use the maximum distance for some counting queries, to filter true positives).
- k-d-trees support squared Euclidean distance and Minkowski norms, while Rtrees have been shown to also support geodetic distance (for finding near points on geodata).