Deleting many files from an S3 bucket

So we found ourselves in the need to delete a considerable amount of files (around 500000, amounting to 1.6T) from an S3 bucket. With the list of files in hand my first shot was calling

aws s3 rm s3://BUCKET/FILE

for each file. That wasn’t the best idea I have to say, since first of all, it makes 500000 requests, and then it takes a looong time. And this command does not allow to pass in multiple files.

Fortunately there is aws s3api delete-objects which takes a json input and can delete multiple files:

aws 3api delete-objects --bucket BUCKET --delete '{"Objects": [ { "Key" : "FILE1" }, { "Key" : "FILE2"} ... ]}}'

That did help, and with a bit of magic from bash (mapfile which can read in lines from stdin in batches) and jq, at the end it was a business of some 20min or so:

cat files-to-be-deleted |  while mapfile -t -n 500 ary && ((${#ary[@]})); do
        objdef=$(printf '%s\n' "${ary[@]}" | jq -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
        aws s3api --no-cli-pager  delete-objects --bucket BUCKET --delete "$objdef"
done

This reads 500 files a time, and reformats it using jq into the proper json format: reduce inputs is a jq filter that iterates over the input lines and does a map/reduce step. In this case we use an empty array as start and add new key/filename pairs on the go. Finally, the whole bunch is send to AWS with the above API call.

Puuuh, 500000 files and 1.6T less, in 20min.

6 Responses

  1. Uma says:

    Hi,

    I have tried the above script to delete the files which is around 30k but its returning \n for every key value except the last file.
    For eg: i have a file of three lines
    abc
    def
    xyz
    When i execute the above script it is deleting only the last line (xyz) and remaining in the output says.
    “Deleted”: [
    {
    “Key”: “abc\n”
    },
    {
    “Key”: “xyz”
    },
    {
    “Key”: “def\n”
    }

    E.g:

    • First check the generated json code by just echo $objdef instead of calling the aws. My guess is your input file has different line endings than expected on the system.

      • uma devi says:

        Thankyou! as you said problem is with the input file which has window line endings. Converted to unix format and scripts works as expected.

        • hron84 says:

          To be platform-independent, put `tr -d ‘\r’` after `cat` call as a filter, it removes windows line endings.

  2. Jagan says:

    Thank you for the script! Worked like charm! (I had to run it on bash v5 on Mac)

  1. 2020/10/29

    […] Deleting many files from an S3 bucket | There and back again […]

Leave a Reply

Your email address will not be published. Required fields are marked *