3
1370

How to list files in S3 bucket with AWS CLI and python

Reading Time: 2 minutes

During development or debugging, S3 access to check this file is the most common operation and there are various ways to list files in S3 bucket. Let’s visit 2 methods here.

Method to list all S3 files

  • List S3 Files with AWS CLI
  • List S3 Files with Python

In the illustration below, let’s assume that you have AWS CLI and python installed. You may refer to the AWS documentation to setup AWS CLI if not installed.

List S3 Files with AWS CLI

For command line style, we will mainly use s3 commands.

The below command will allow you to list the objects in a specific bucket.

# List file in my-bucket hosted in 'http://s3-url'
aws --endpoint-url=http://s3-url s3 ls s3://my-bucket 

This method above is not really human readable as it will display a list of json objects. But this is a good method if you want to view full detail.

The below method will display the files line by line that is more human-readable way.

# List file in my-bucket folder 'folder-name' in summarize version
aws s3 ls s3://my-bucket/folder-name --recursive --summarize 

# Results will look something like this
2020-01-01 03:00:01 3.1KiB sub-folder/my-file1.txt
2020-01-02 03:00:01 3.1KiB my-file2.txt

if you only want to list the file name including the folder name in prefix, we will need to tap onto s3api command to display the result with a filter. We also have a reference here to illustrate the differences between s3 and s3api commands.

# List file in my-bucket folder 'folder-name' in summarize version
aws --end-point=http://s3-url s3api list-objects --bucket my-bucket --output text --query "Content[].{Key:Key} "

# Results will look something like this
sub-folder/my-file1.txt
my-file2.txt

List S3 Files with Python

The other easy way is via python code.

Step 1: Let’s install boto3 python package.

python -m pip install boto3

Step 2: We create a file call list_files.py

We are making use of boto3 session client to configure the client file for the ease of illustration. There are more secure way such as to put this inside configuration file.

import boto3

Session = boto3.session. Session()

s3_client = session.client(
    service_name='s3',
    aws_access_key_id='your_access_key',
    aws_secret_access_key='your_secret_access_key',
    endpoint_url='http://xxxx:port',
)

paginator = s3_cient.getpaginator('list_object_v2')
pages = paginator.paginate(Bucket='bucket-name')

for page in pages:
    for obj in page['Contents']:
        print(obj['Key'])

Step 3: Execute the file

We will push the results printed into a text file.

python list_files.py > list_result.txt

Conclusion

There are many methods to list file in S3 and you have seen 2 of them in this article. AWS CLI is the most common way to list files for development and debugging and python execution is useful when integrating with processes.

Show Comments

No Responses Yet

Leave a Reply