As per this question I asked previously on Google App Engine, if I have access to all the information in a standard email, not just the From
, To
, Subject
, Body
fields, but also all the headers and MIME information, how can I verify that two incoming emails with the same From
address are actually from the same sender.
What I've considered thus far:
I realize this is a complicated question (I'm sure companies like Posterous have spent tons of time on this problem). I'm just looking for a few criteria to get started preliminarily. Thanks!
Update:
The answers so far are really helping, but just to help them out, the context of my project is that I would be receiving tons and tons of email as a web app from my users. They would use their email as the primary way of inputting data into my system. This I why I made the Posterous analogy. The use case is very similar.
You're right that all of the headers together, and 'known good' email to compare to can help identify likely spoofed emails.
What you're developing would probably be at best a heuristic rather than an algorithm.
I'd consider weighting the fields by time-of-day and how close to 'known good' emails' time-of-day ...
Also, if the 'known good' emails are structured differently than the suspect; i.e. Inline images, html, shortened url's, etc.