What is Youtube comment system sorting / ranking algorithm?

TrungDQ picture TrungDQ · Jan 5, 2015 · Viewed 23.9k times · Source

Youtube provides two sorting options: Newest first and Top comments. The "Newest first" is pretty simple that we just sort the comments by their post date. But the "Top comments" seems to be a lot more complex than just sorting by "thumb up"s.

Youtube comment system

After a short research, I found out that the order of comments depends on those things:

  • Number of "thumb up"s and "thumb down"s
  • Post date
  • Number of replies to that comment

But I don't know how Youtube uses this information to decide the order, like what information is more important and what is less important.

Is there any article about this topic that I could refer to?

Thanks!

Answer

Cole Dixon picture Cole Dixon · Aug 20, 2016

I have the answer to your question.

After searching the internet for the answer to this, I never found precisely what I was looking for. So, my colleagues and I decided to experiment using the system with the Youtube comments.

First of all, we sorted what we believed to be popular videos into one section, average videos into another, and less popular into the last. There were 200 videos in each section, and after days of examining we started to notice a pattern. We found that you were right about the three things required, but we also dove a little deeper and found an additional variable.

The Youtube comment system depends on four things:

1) Time it was posted,

2) Like/dislike ratio of a comment,

3) Number of replies,

4) And, believe it or not, WHO posted it.

The average like/dislike ratio of every public comment you've ever posted builds into it, as (what we predicted) they believe that those with low like/dislike ratios would post comments that many people do not like or simply disagree with.

There is an algorithm to it, and it is quite simpler than you might think. Basically there are these things that we called "module points," and you get a certain one based on these four factors. First, here's the things you need to know about module point conversion with TWO of the factors:

  • For the like/dislike ratio on the comment, multiply that number by ten.

  • For the amount of replies (NOT from the original poster) that the comment has, there are two module points.

These are the two basic factors that tell the amount of module points the comment has.

For example, if a comment had 27 likes and 8 dislikes, then the ratio would be 3.375. Multiplying by 10, you would then have 33.75 module points. Using the next factor, amount of replies, let's say this comment has 4 direct replies to it. Multiplying 2 by 4, we get 8. This is the part where you add 8 onto the accumulative module points, giving you a total of 41.75 module points.

But we're not done here; this is where it gets tricky.

Using the average like/dislike ratio of a person's total comments that they've ever posted publicly, we found that the formula added onto the accumulative module points is this:

C = MP(R/3) + (MP/10)

where C = Comment Position Variable; MP = Module Points; R = Person's total like/dislike ratio

Trust me, we spend DAYS just on this part, which was probably the most frustrating. Even though the 3 and the 10 within this equation seem random and unnecessary, so far all of the comments we tested this equation on passed the test, but did not pass the test when those two variables were removed. After this equation is done, it gives you a number that we named to be the Position Variable.

However, we are not even done yet, we still haven't talked about time.

I was actually quite surprised that this part didn't take as long as I expected, but it sure was a pain doing this equation every single time for every comment we tested. At first, when testing it, we figured that the time was just there to break the barrier if 2 comments had equal Position Variables.

In fact, I almost called it a wrap on the experiment when this happened, but upon further inspection, we found out there was more to do. We found that some of the comments outranked each other that had the same Position Variable, but the timing seemed to be random! After a few days of inspection, here is where the final result comes in:

There is yet ANOTHER equation that we must find before applying the 4th variable. Using another separate equation, here's what our algebraic deductions came down to:

X = 1/3(S/10 + A) x [absolute value of](A - 3S)

where X = Timing Variable; S = How long ago the video was posted in minutes; A = How long ago the comment was posted in minutes

I wish I was making this up, but unfortunately this is how complicated the system is. There are mathematical reasons behind the other variables, but they are far too complex to explain, it will probably take up atleast three paragraphs worth of explaining. We tested this equation on more than 150 comments, all of them checked out to be true.

Once you find X, which is what we called the Timing Variable, all you have to do from here is apply it to this equation:

N = X(C/4 + 1)

where X = Timing Variable; C = Positioning Variable

N is the answer to all your problems.

This is the final equation, the final answer. The simple conclusion: the higher N, the higher up the comment is.

Note: Special thanks to my colleagues: David Mattison, Josh Williams, Diego Mendieta, Steven Orsette, and Kyle Shropshire. I could have never found out this without them and the work they put into this.