I am trying to back up a DynamoDB table to S3. Since for some reason doing it through Export on the AWS console does not work and also since the table is not that big I am trying to do it using a boto-based script. Here's the main block of my script:
import boto.dynamodb2
from boto.dynamodb2.table import Table
c_ddb2 = boto.dynamodb2.connect_to_region(...)
table = Table("myTable",connection=c_ddb2)
# also connect to S3
scanres = table.scan()
for item in scanres:
# process and store next item
I am getting the following exception:
Traceback (most recent call last):
File "/home/.../ddb2s3.py", line 155, in <module>
main()
File "/home/.../ddb2s3.py", line 124, in main
for it in scanres:
File "/usr/local/lib/python2.7/dist-packages/boto/dynamodb2/results.py", line 62, in next
self.fetch_more()
File "/usr/local/lib/python2.7/dist-packages/boto/dynamodb2/results.py", line 144, in fetch_more
results = self.the_callable(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/boto/dynamodb2/table.py", line 1213, in _scan
**kwargs
File "/usr/local/lib/python2.7/dist-packages/boto/dynamodb2/layer1.py", line 1712, in scan
body=json.dumps(params))
File "/usr/local/lib/python2.7/dist-packages/boto/dynamodb2/layer1.py", line 2100, in make_request
retry_handler=self._retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 932, in _mexe
status = retry_handler(response, i, next_sleep)
File "/usr/local/lib/python2.7/dist-packages/boto/dynamodb2/layer1.py", line 2134, in _retry_handler
response.status, response.reason, data)
boto.dynamodb2.exceptions.ProvisionedThroughputExceededException: ProvisionedThroughputExceededException: 400 Bad Request
{u'message': u'The level of configured provisioned throughput for the table was exceeded. Consider increasing your provisioning level with the UpdateTable API', u'__type': u'com.amazonaws.dynamodb.v20120810#ProvisionedThroughputExceededException'}
The read provisioned throughput is set to 1000 so it should be enough. The write provisioned t/p was set to a low value when I ran the script and got the exception and I did not want to adjust it since it would interfere with occasional batch writes into the table, but why would I need to touch it?
Why am I getting this error? The AWS console monitoring for MyTable
is showing very few reads so it's way below the provisioned 1000. What am I doing wrong?
If you have checked in the AWS Management Console and verified that throttling events are occurring even when read capacity is well below provisioned capacity the most likely answer is that your hash keys are not evenly distributed. As your DynamoDB table grows in size and capacity, the DynamoDB service will automatically split your table into partitions. It will then use the hash key of the item to determine which partition to store the item. In addition, your provisioned read capacity is also split evenly among the partitions.
If you have a well-distributed hash key, this all works fine. But if your hash key is not well distributed it can cause all or most of your reads to come from a single partition. So, for example, if you had 10 partitions and you had a provisioned read capacity of 1000 on the table, each partition would have a read capacity of 100. If all of your reads are hitting one partition you will be throttled at 100 read units rather than 1000.
Unfortunately, the only way to really fix this problem is to pick a better hash and rewrite the table with those hash values.