I am using zookeeper to get data from kafka. And here I always get data from last offset point. Is there any way to specify the time of offset to get old data?
There is one option autooffset.reset. It accepts smallest or largest. Can someone please explain what is smallest and largest. Can autooffset.reset helps in getting data from old offset point instead of latest offset point?
The consumers belong always to a group and, for each partition, the Zookeeper keeps track of the progress of that consumer group in the partition.
To fetch from the beginning, you can delete all the data associated with progress as Hussain refered
ZkUtils.maybeDeletePath(${zkhost:zkport}", "/consumers/${group.id}");
You can also specify the offset of partition you want, as specified in core/src/main/scala/kafka/tools/UpdateOffsetsInZK.scala
ZkUtils.updatePersistentPath(zkClient, topicDirs.consumerOffsetDir + "/" + partition, offset.toString)
However the offset is not time indexed, but you know for each partition is a sequence.
If your message contains a timestamp (and beware that this timestamp has nothing to do with the moment Kafka received your message), you can try to do an indexer that attempts to retrieve one entry in steps by incrementing the offset by N, and store the tuple (topic X, part 2, offset 100, timestamp) somewhere.
When you want to retrieve entries from a specified moment in time, you can apply a binary search to your rough index until you find the entry you want and fetch from there.