Counting words on a html web page using php

DomingoSL picture DomingoSL · Aug 15, 2010 · Viewed 13.4k times · Source

I need a PHP script which takes a URL of a web page and then echoes how many times a word is mentioned.

Example

This is a generic HTML page:

<html>
<body>
<h1> This is the title </h1>
<p> some description text here, <b>this</b> is a word. </p>
</body>
</html>

This will be the PHP script:

<?php
htmlurl="generichtml.com";
the script here
echo(result);
?>

So the output will be a table like this:

WORDS       Mentions
This        2
is          2
the         1
title       1
some        1
description 1
text        1
a           1
word        1

This is something like the search bots do when they are surfing the web, so, any idea of how to begin, or even better, do you have a PHP script which already does this?

Answer

Peter Ajtai picture Peter Ajtai · Aug 15, 2010

The one line below will do a case insensitive word count after stripping all HTML tags from your string.

Live Example

print_r(array_count_values(str_word_count(strip_tags(strtolower($str)), 1)));

To grab the source code of a page you can use cURL or file_get_contents()

$str = file_get_contents('http://www.example.com/');

From inside out:

  1. Use strtolower() to make everything lower case.
  2. Strip HTML tags using strip_tags()
  3. Create an array of words used using str_word_count(). The argument 1 returns an array containing all the words found inside the string.
  4. Use array_count_values() to capture words used more than once by counting the occurrence of each value in your array of words.
  5. Use print_r() to display the results.