Granting AWS Glue Crawler Access to a Cross-Account S3 Bucket

Imagine you’re working with AWS services spread across multiple accounts. Your data is stored in an Amazon S3 bucket in one account (Account B), but your AWS Glue service is hosted in another (Account A). Establishing a connection between these services can seem daunting, but fear not! This blog post will walk you through the necessary steps to empower your AWS Glue crawlers with cross-account access to S3 buckets.

Understanding the Challenge

Integrating services across AWS accounts requires attention to detail with IAM roles and policies. Without proper configuration, your AWS Glue crawler will be denied access, leading to frustrating troubleshooting sessions. Let’s simplify this process.

Scenario at Hand:

Account A: Owns the AWS Glue service.
Account B: Contains the S3 bucket needing to be crawled.

The goal is clear: Enable the AWS Glue crawler in Account A to analyze data stored within the S3 bucket in Account B.

Granting AWS Glue Crawler Access to a Cross-Account S3 Bucket

The Resolution Path

Setting up cross-account access involves creating a two-way trust by applying IAM policies and roles meticulously. Here’s what needs to be done:

Steps in Account B:

Edit S3 Bucket Policy
Start by modifying the permissions of the S3 bucket to incorporate a policy that allows access to Account A. Replace your-s3-bucket-datasets with your bucket name and <ACCOUNT-A> with Account A’s ID:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<ACCOUNT-A-ID>:root",
                ]
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:PutObject"
              // add more action policy here..
            ],
            "Resource": [
                "arn:aws:s3:::your-s3-bucket-datasets",
                "arn:aws:s3:::your-s3-bucket-datasets/*"
            ]
        }
    ]
}

This policy grants the AWS Glue service role in Account A full access to the specified S3 bucket.

Steps in Account A:

Configure AWS Glue Service Role
Create an IAM role called AWSGlueServiceRole-project-name if it doesn’t already exist. Then, attach a trust policy allowing the role to assume cross-account access using AWS Security Token Service (STS). Replace <ACCOUNT-B-ID> with Account B’s ID:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1",
            "Effect": "Allow",
            "Principal": {
                "Service": "glue.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<Unique-External-ID>"
                }
            }
        }
    ]
}

Note: The sts:ExternalId is a unique identifier that you should generate and share securely between accounts to ensure an additional layer of security.

Add the following IAM policy in AWSGlueServiceRole-project-name:

Update the AWS Glue Crawler Configuration
In the AWS Glue console within Account A, configure the crawler to use the newly created role when accessing resources.

Final Thoughts

Once the above steps are complete, your AWS Glue crawler should have seamless access to the desired S3 bucket in Account B, allowing for efficient data analysis and processing across your AWS environments.

Keep in mind, always review and minimize permissions to adhere to the principle of least privilege, ensuring your AWS infrastructure remains secure.

For more detailed instructions or any assistance, you’re welcome to check out the official AWS Documentation or reach out to the community on forums such as AWS re:Post.

Happy Crawling and stay cloud-wise!