In Java I connect to Cussandra cluster as this:
Cluster cluster = Cluster.builder().addContactPoints("host-001","host-002").build();
Do I need to specify all hosts of the cluster in there? What If I have a cluster of 1000 nodes? Do I randomly choose few? How many, and do I really do that randomly?
I would say that configuring your client to use the same list of nodes as the list of seed nodes you configured Cassandra to use will give you the best results.
As you know Cassandra nodes use the seed nodes to find each other and discover the topology of the ring. The driver will use only one of the nodes provided in the list to establish the control connection, the one used to discover the cluster topology, but providing the client with the seed nodes will increase the chance for the client to continue to operate in case of node failures.