I don't quite understand this Markov... it takes two words a prefix and suffix saves up a list of them and makes random word?
/* Copyright (C) 1999 Lucent Technologies */
/* Excerpted from 'The Practice of Programming' */
/* by Brian W. Kernighan and Rob Pike */
#include <time.h>
#include <iostream>
#include <string>
#include <deque>
#include <map>
#include <vector>
using namespace std;
const int NPREF = 2;
const char NONWORD[] = "\n"; // cannot appear as real line: we remove newlines
const int MAXGEN = 10000; // maximum words generated
typedef deque<string> Prefix;
map<Prefix, vector<string> > statetab; // prefix -> suffixes
void build(Prefix&, istream&);
void generate(int nwords);
void add(Prefix&, const string&);
// markov main: markov-chain random text generation
int main(void)
{
int nwords = MAXGEN;
Prefix prefix; // current input prefix
srand(time(NULL));
for (int i = 0; i < NPREF; i++)
add(prefix, NONWORD);
build(prefix, cin);
add(prefix, NONWORD);
generate(nwords);
return 0;
}
// build: read input words, build state table
void build(Prefix& prefix, istream& in)
{
string buf;
while (in >> buf)
add(prefix, buf);
}
// add: add word to suffix deque, update prefix
void add(Prefix& prefix, const string& s)
{
if (prefix.size() == NPREF) {
statetab[prefix].push_back(s);
prefix.pop_front();
}
prefix.push_back(s);
}
// generate: produce output, one word per line
void generate(int nwords)
{
Prefix prefix;
int i;
for (i = 0; i < NPREF; i++)
add(prefix, NONWORD);
for (i = 0; i < nwords; i++) {
vector<string>& suf = statetab[prefix];
const string& w = suf[rand() % suf.size()];
if (w == NONWORD)
break;
cout << w << "\n";
prefix.pop_front(); // advance
prefix.push_back(w);
}
}
According to Wikipedia, a Markov Chain is a random process where the next state is dependent on the previous state. This is a little difficult to understand, so I'll try to explain it better:
What you're looking at, seems to be a program that generates a text-based Markov Chain. Essentially the algorithm for that is as follows:
For example, if you look at the very first sentence of this solution, you can come up with the following frequency table:
According: to(100%)
to: Wikipedia(100%)
Wikipedia: ,(100%)
a: Markov(50%), random(50%)
Markov: Chain(100%)
Chain: is(100%)
is: a(33%), dependent(33%), ...(33%)
random: process(100%)
process: with(100%)
.
.
.
better: :(100%)
Essentially, the state transition from one state to another is probability based. In the case of a text-based Markov Chain, the transition probability is based on the frequency of words following the selected word. So the selected word represents the previous state and the frequency table or words represents the (possible) successive states. You find the successive state if you know the previous state (that's the only way you get the right frequency table), so this fits in with the definition where the successive state is dependent on the previous state.
Shameless Plug - I wrote a program to do just this in Perl, some time ago. You can read about it here.