I'm looking for the specifications of the TREC format. I've been googling a lot but I didn't find a clue.
Does any one know where to find any information about it?
AFAIK TREC is an abbreviation for NIST's Text REtrieval Conference. In order for the indexer to know where the document boundaries are within files, each document must have begin document and end document tags. These tags are similar to HTML or XML tags and are actually the format for TREC documents.
TrecParser: This parser recognizes text in the TEXT, HL, HEAD, HEADLINE, TTL, and LP fields.
Source: TREC Wikipedia
Source: Lemur Guide