I'm trying to index a containerized Elasticsearch db using the Python client https://github.com/elastic/elasticsearch-py called from a script (running in a container too).
By looking at existing pieces of code, it seems that docker-compose
is a useful tool to use for my purpose. My dir structure is
docker-compose.yml
indexer/
- Dockerfile
- indexer.py
- requirements.txt
elasticsearch/
- Dockerfile
My docker-compose.yml
reads
version: '3'
services:
elasticsearch:
build: elasticsearch/
ports:
- 9200:9200
networks:
- deploy_network
container_name: elasticsearch
indexer:
build: indexer/
depends_on:
- elasticsearch
networks:
- deploy_network
container_name: indexer
networks:
deploy_network:
driver: bridge
indexer.py
reads
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
es = Elasticsearch(hosts=[{"host":'elasticsearch'}]) # what should I put here?
actions = [
{
'_index' : 'test',
'_type' : 'content',
'_id' : str(item['id']),
'_source' : item,
}
for item in [{'id': 1, 'foo': 'bar'}, {'id': 2, 'foo': 'spam'}]
]
# create index
print("Indexing Elasticsearch db... (please hold on)")
bulk(es, actions)
print("...done indexing :-)")
The Dockerfile for the elasticsearch service is
FROM docker.elastic.co/elasticsearch/elasticsearch-oss:6.1.3
EXPOSE 9200
EXPOSE 9300
and that for the indexer is
FROM python:3.6-slim
WORKDIR /app
ADD . /app
RUN pip install -r requirements.txt
ENTRYPOINT [ "python" ]
CMD [ "indexer.py" ]
with requirements.txt
containing only elasticsearch
to be downloaded with pip.
Running with docker-compose run indexer
gives me the error message at https://pastebin.com/6U8maxGX (ConnectionRefusedError: [Errno 111] Connection refused
).
elasticsearch is up as far as I can see with curl -XGET 'http://localhost:9200/'
or by running docker ps -a
.
How can I modify my docker-compose.yml
or indexer.py
to solve the problem?
The issue is a synchronisation bug: elasticsearch
hasn't fully started when indexer
tries to connect to it. You'll have to add some retry logic which makes sure that elasticsearch
is up and running before you try to run queries against it. Something like running es.ping()
in a loop until it succeeds with an exponential backoff should do the trick.
UPDATE: The Docker HEALTHCHECK
instruction can be used to achieve a similar result (i.e. make sure that elasticsearch
is up and running before trying to run queries against it).