Creating a threaded private messaging system like facebook and gmail

bigmike7801 picture bigmike7801 · Jun 21, 2011 · Viewed 20.7k times · Source

I am creating a threaded message system much like gmail and facebook where the inbox lists the most recent threads displaying the subject, the sender's name, and time-stamp of the most recent message.

Here's How my tables are set up:

users:
    user_id
    user_name

thread:
    thread_id
    title
    to_id
    to_keep
    to_read
    from_id
    from_keep
    date

message:
    message_id
    thread_id
    to_id
    from_id
    message_text
    date

What I'm doing right now is when a user creates a new message, it creates a new thread in the thread table and then a new message in the message table and if a user responds to a thread, it duplicates the current thread in the thread table except it swaps the to_id and from_id and then creates a new message based on that.

Also, for the inbox view, I'm able to just query all threads based on a user_id. so something like SELECT * FROM thread WHERE to_id = 2 and to_keep = TRUE ORDER BY date DESC or if I want to view messages in the outbox it would be something like SELECT * FROM thread WHERE from_id = 2 and from_keep = TRUE ORDER BY date DESC.

If a user opens a thread when there's a new message, then to_read is updated to true UPDATE thread SET to_read = TRUE WHERE thread_id = 4.

I feel like I'm over complicating this process and that there should be a better way to do this.

Any help or ideas would be appreciated.

This way let's me just select everything from the thread table and then do a join with the user table to display everything I need. However I feel like there should be a better way to do this.

Answer

Joel Brown picture Joel Brown · Jun 21, 2011

Why don't you separate out message relationships from user's view of each message?

I would do the threading by a self-referencing relationship on message. In other words, the message has a "responding_to_message_id" column.

I'm not sure I understand why you have a "to_id". Are messages directed to individual users? This seems very limited. I would think that you would either have no recipient (i.e. the recipient is a message board that anyone can read) or you would have the ability to specify multiple recipients, just like with an e-mail. Perhaps you can explain more about how the system is to be used.

Assuming (for simplicity) that you are posting to a board, so only the "from" is important, then you have your message table, with self-referencing relationship for threading, a user table, and then an intersection table between user and message that stores which messages have been read by each user.

That way, if you want to know if a user has read a message or not, just attempt to read the user ID in the intersection table for the given message. If it isn't there, then that message is unread by that user.

Note that if you want to have single recipients this design holds and if you want to have multiple recipients you can use an intersection table to hold the list of recipients for each message. If you do have a recipient intersection table, it can do double-duty as your read status table.

EDIT: ERD Sketch:

Here is a quick sketch of what I'm talking about...

ERD Sketch

Whether or not the sender has chosen to keep the message is flagged on the message itself. If the message is the start of a new thread, the reply_to_message_id column is NULL, otherwise it is the message_id of the parent message. There can be mulitple recipients, each of which have their own ability to keep the message or not, as well as the ability to track the date and time when the recipient reads the message.

EDIT 2: Alternate ERD and Querying for Most Recent Message

@OP asked how to query for the most recent message in a thread. The answer depends on the form of the thread. You can either have a flat thread where every message goes to the end of the linear stream of messages or you can have a tree-shaped thread where each message has a specific parent, unless it's the root of the thread. In the ERD above, the reply_to_message_id field could be used either way. If the thread is flat, then the FK is always to the root MESSAGE. If the thread is tree-shaped, then the FK is to the immediate parent of the reply MESSAGE.

If a typical query you want to run is "what is the most recent message in a thread?" and your threads are flat, then you can use SQL like this:

select top 1
  M.message_id
, M.sent_datetime
, M.title
, M.message_text
, S.user_id
, S.user_name
-- and anything else you want...
from MESSAGE M inner join USER S
  on M.sender_user_id = U.user_id
where M.reply_to_message_id = @ThreadRootMessageID
order by
  M.sent_datetime desc

If, on the other hand, your threads are tree-shaped and this is a query you want to be able to run quickly and easily, then the schema in the ERD above is not very easy to work with. SQL is not good at trees. You can solve the problem with a little bit of denormalization. See the ERD below:

Tree Thread ERD

Note that there is now one FK to show the immediate parent and one FK to show the root. Since threads aren't subject to editing - at least to edits where the root of a message is changed to point at a different thread, the denormalization that this entails does not imply risk of update anomallies so the redundancy is not too problematic.

If you use this ERD then the query for "most recent message in thread X" is the same as above, but with M.thread_root_message_id in the where clause instead of M.reply_to_message_id.