Deleting many files from an S3 bucket
So we found ourselves in the need to delete a considerable amount of files (around 500000, amounting to 1.6T) from an S3 bucket. With the list of files in hand my first shot was calling
aws s3 rm s3://BUCKET/FILE
for each file. That wasn’t the best idea I have to say, since first of all, it makes 500000 requests, and then it takes a looong time. And this command does not allow to pass in multiple files.
Fortunately there is aws s3api delete-objects
which takes a json input and can delete multiple files:
aws 3api delete-objects --bucket BUCKET --delete '{"Objects": [ { "Key" : "FILE1" }, { "Key" : "FILE2"} ... ]}}'
That did help, and with a bit of magic from bash (mapfile
which can read in lines from stdin in batches) and jq
, at the end it was a business of some 20min or so:
cat files-to-be-deleted | while mapfile -t -n 500 ary && ((${#ary[@]})); do
objdef=$(printf '%s\n' "${ary[@]}" | jq -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
aws s3api --no-cli-pager delete-objects --bucket BUCKET --delete "$objdef"
done
This reads 500 files a time, and reformats it using jq
into the proper json format: reduce inputs
is a jq
filter that iterates over the input lines and does a map/reduce step. In this case we use an empty array as start and add new key/filename pairs on the go. Finally, the whole bunch is send to AWS with the above API call.
Puuuh, 500000 files and 1.6T less, in 20min.
Hi,
I have tried the above script to delete the files which is around 30k but its returning \n for every key value except the last file.
For eg: i have a file of three lines
abc
def
xyz
When i execute the above script it is deleting only the last line (xyz) and remaining in the output says.
“Deleted”: [
{
“Key”: “abc\n”
},
{
“Key”: “xyz”
},
{
“Key”: “def\n”
}
E.g:
First check the generated json code by just
echo $objdef
instead of calling the aws. My guess is your input file has different line endings than expected on the system.Thankyou! as you said problem is with the input file which has window line endings. Converted to unix format and scripts works as expected.
To be platform-independent, put `tr -d ‘\r’` after `cat` call as a filter, it removes windows line endings.
Thank you for the script! Worked like charm! (I had to run it on bash v5 on Mac)