Securing Amazon SageMaker: Attack Surface Explained

Oct 09, 2024
23 minutes
... views
SageMaker attack vectors diagram
Figure 1: SageMaker attack vectors diagram

As organizations increasingly rely on Amazon SageMaker for their machine learning (ML) needs, understanding and mitigating security risks becomes paramount. Amazon SageMaker is a comprehensive platform supporting the entire ML lifecycle, including data preparation, model training, deployment and monitoring. Its integration with other AWS services, such as Bedrock, enhances its capabilities, enabling users to incorporate state-of-the-art large language models (LLMs) into their workflows.

When best practices are not followed, Amazon SageMaker’s power and flexibility may pose potential risks. The platform’s extensive features and integrations can create a broad attack surface, making it imperative to understand potential vulnerabilities and how to mitigate them.

In this blog post, we explore various features built into Amazon SageMaker. If these features are used without understanding and considering their implications, bad actors can exploit them as attack vectors. Feel free to use this blog as a tutorial for Amazon SageMaker best practices. And let us know if you find them helpful in your work, as we'll be happy to share more insights.

Palo Alto Networks and Amazon SageMaker’s team collaborate to share knowledge and improve cloud and AI service security practices. With that said, Prisma Cloud’s Code to Cloud platform includes a set of modules that can help identify these misconfigurations so you can "stay ahead of the bad guys."

Table of Contents 

Attack Vector 1: Exploiting AWS Cognito-IDP Permissions

Attack Vector 2: Sagemaker domain hopping by notebook creation

Attack Vector 3: SageMaker pipeline integration Lambda + Databricks use case

Attack Vector 4: Steal cross-domain Canvas secrets

Attack Vector 5: AWS Resource Group modification

Mitigating SageMaker security risks

Summary and Takeaways

Understanding the Basics

SageMaker domains act as control planes for managing ML environments and user profiles. During Quick Setup, they automatically create necessary IAM roles attached to each domain and profile to maintain separation. While these default IAM roles facilitate functionality, they can introduce security risks due to their broad access permissions, if not properly managed.

For example, the default IAM role includes policies like AmazonSageMakerFullAccess, granting access to services such as AWS Glue and AWS Lambda. If Amazon SageMaker Canvas is enabled, additional policies like AmazonSageMakerCanvasFullAccess and AmazonSageMakerCanvasAIServicesAccess are also granted, which can lead to security risks if not properly managed.

IAM role managed policies from a domain created via Quick Setup
Figure 2: IAM role managed policies from a domain created via Quick Setup

Attack Vector 1: Exploiting AWS Cognito-IDP Permissions

By leveraging overprivileged AWS Cognito-IDP permissions in the AmazonSageMakerFullAccess policy, an attacker can manipulate existing user pools and exploit multiple permissions granted by the policy. This enables lateral movement attacks and control over IAM roles and other AWS services that leverage AWS Cognito User Pools as an identity provider.

Background

Amazon Cognito User Pools: These are user directories that provide sign-up and sign-in options for application users. They handle user registration, authentication, account recovery, and more.

Amazon Cognito Identity Pools: Also known as federated identities, these allow users to obtain temporary AWS credentials to access AWS services. They integrate with various identity providers, including Cognito User Pools and third-party providers.

AWS Ground Truth Private Workforce: This manual data-labeling feature enables SageMaker users to create a "labeling workforce" (group of workers either vendor-managed or chosen by the customer to label datasets) associated with a Cognito User Pool. The user pool registers users under it when inviting workers to the workforce.

Private workforce creates a Cognito User Pool when established
Figure 3: Private workforce creates a Cognito User Pool when established

The AmazonSageMakerFullAccess policy (granted by default during Quick Setup) includes Cognito-IDP permissions, which if misused, can lead to significant security breaches
of these services.

Illustration of AWS Cognito use case where the service can be used for API access and IAM access
Figure 4: Illustration of AWS Cognito use case where the service can be used for API access and IAM access

The AmazonSageMakerFullAccess policy (granted by default during Quick Setup) includes Cognito-IDP permissions, which if misused, can lead to significant security breaches
of these services.

Permissions granted from AmazonSageMakerFullAccess
Figure 5: Permissions granted from AmazonSageMakerFullAccess

Multiple overprivileged permissions may appear when inspecting permissions regarding Cognito-IDP in AmazonSageMakerFullAccess.

Although SageMaker leverages these permissions to create user pools specifically for a labeling workforce, permissions for actions such as creating and updating user pools are set with a broad resource scope (all resources). This means that domains created via Quick Setup or those granted AmazonSageMakerFullAccess can manipulate Cognito User Pools in ways that can be exploited for destructive attacks.

Attack Flow Summary

  1. Create a malicious user in the user pool: Use cognito-idp:AdminCreateUser permission to create a new user with a temporary password.
  2. Add the user to all groups: Use cognito-idp:AdminAddUserToGroup to assign the user to every group within the user pool, inheriting all associated roles.
  3. Authenticate and retrieve tokens: Perform authentication flows to obtain JWTs that grant access to AWS resources based on group membership.
  4. Enumerate all identity pool roles: Identify IAM roles linked to the identity pool and assume they use their GUID to gain AWS credentials.
  5. Leverage AWS credentials for unauthorized access: Use the obtained AWS credentials to access or manipulate AWS services, leading to potential data breaches or service disruptions.
Diagram showing Cognito attack vector
Figure 6: Diagram showing Cognito attack vector

Detailed Attack Steps

1a. Create a Malicious User

Use the cognito-idp:AdminCreateUser permission to create a new user.

After creation, the attacker needs to change the password and obtain a user pool client ID.

1b. Add Malicious User to All Groups

Cognito-idp uses ‘Groups’ to bring together users and map their IAM roles, if needed.

Here we can see the IAM role attribute for each group (IAM role optional).

The attackers can add themselves to each group using the aws cognito-idp admin-add-user-to-group CLI command (via the cognito-idp:AdminAddUserToGroup permission).

1c. Authenticate and Retrieve Tokens

Authenticate using aws cognito-idp initiate-auth:

aws cognito-idp initiate-auth and aws cognito-idp respond-to-auth-challenge are API calls that are required for user authentication and can be used without any AWS permissions.

A JWT bearer token is received. If we parse the token, we can see the following:

Cognito User Pool JWT decode
Figure 7: Cognito User Pool JWT decode

Explanation of JWT

  • Cognito:groups – Groups to which the user is attached
  • Cognito:roles – Mapped roles for the groups to which the user has access
  • Custom:anotherclaim – Custom attributes added to JWT that the identity pool can check. When the JWT encapsulates possible groups and IAM roles that the user can access when submitting authorization to the identity pool, the attributes are optionally checked by a rule-based system in the identity pool.
Rule-based role mapping via JWT attributes
Figure 8: Rule-based role mapping via JWT attributes

If "Choose role with rules" is applied, the attacker can guess what claim is needed during the user-creating process by using user enumeration.

1d. Enumerate and Exploit Identity Pool Roles

Understanding Trust Relationships

A trust relationship’s role mapped to an identity pool looks like this:

Trust relationship reveals identity-pool GUID
Figure 9: Trust relationship reveals identity-pool GUID

Now let's dissect the trust relationship policy.

Principal.Federated: cognito-identity.amazonaws.com

  • This specifies the identity provider used in the policy, enabling identities from cognito-identity.amazonaws.com.

Action: sts:AssumeRoleWithWebIdentity

  • This grants temporary security credentials to the federated user, enabling interaction with AWS services. It is useful when identities need AWS access without creating a dedicated IAM user.

Conditions

  • Each identity provider supported by AWS maintains customized conditions for more granular control over IAM role access.
  • The specific condition cognito-identity.amazonaws.com:aud checks if the request originates from a specific identity pool that is identified by its pool ID (<region>:<GUID>). This ensures that only a specific identity pool can obtain a federated token.

For more information, see IAM Roles for IDP.

The trust relationship policy leaks the identity pool GUID to which the attacker does not have access (using the given permissions) but is essential to know to obtain IAM credentials at a later stage.

Enumerating Identity Pool IAM Roles

The following bash script checks the roles using IAM:ListRoles permission from:

AmazonSageMakerFullAccess, with the goal to find the:

  • IAM role used by an identity pool
  • identity pool GUID used by the role

Example output

1e. Leverage AWS Credentials for Unauthorized Access

To get the mapped AWS IAM credentials successfully, the attacker needs to make the cognito-identity get-id (to get an internal mapped ID between the user pool and identity pool) and get-credentials-for-identity (to get the AWS credentials of the mapped roles). Both API calls can be used without IAM permissions using the --logins flag along with the JWT obtained from step 1c.):

Demonstration of AWS role credential acquired from Notebook CLI
Figure 10: Demonstration of AWS role credential acquired from Notebook CLI

Through this method, the user can also switch between roles that use the --custom-role-arn flag with roles that are in the cognito:groups claim.

Now the attacker has moved between IAM roles, having escaped from the roles used in SageMaker Notebook. Depending on the client environment, the IAM roles that are mapped to the identity pool can be overprivileged or can offer an attacker more vectors for lateral movement attacks.

Bonus 1: Leaking User Pool Data

Using cognito-idp:ListUser permission, an attacker can access PII such as emails, phone numbers and any other attribute used in the user pool.

Bonus 2: Bypassing Security Restrictions

Disable advanced security

The Cognito User Pool has an advanced security feature that can block, notify and find login anomalies during the authentication process. This is used when some user credentials are leaked or when suspicious activity is detected during login. It can be easily turned off via the cognito-idp:UpdateUserPool permission.

Mitigation Strategies

  • Restrict IAM permissions: Limit cognito-idp permissions to only those necessary for SageMaker operations.
  • Implement least privilege principle: Ensure roles have the minimum permissions required.
  • Monitor and audit: Regularly review IAM policies and monitor for unusual activities.

Attack Vector 2: SageMaker Domain Hopping by Notebook Creation

Domain hopping attack vector flow
Figure 11: Domain hopping attack vector flow

Attack Vector Overview & Impact

When creating domains in SageMaker via the Quick Setup option, the platform automatically creates necessary IAM roles with policies that enable functionality across different applications such as SageMaker Canvas and SageMaker Studio. While these roles are crucial in maintaining the separation of different domains and user profiles, they can be exploited if not properly secured.

An attacker with access to a SageMaker notebook can create a new notebook instance with an IAM role from a different domain, effectively bypassing user segregation and gaining unauthorized access to new compute and data resources (S3, RDS, AI models).

Attack Flow Summary

The lateral movement attack starts from a notebook in domain A.

  1. Enumerate IAM roles: The attacker uses the iam:ListRoles permission to identify IAM roles via a trust relationship with sagemaker.amazonaws.com.
  2. Create a new (privileged) notebook instance: The attacker now uses the Sagemaker:CreateNotebookInstance API call to create a new SageMaker Notebook with an IAM role from a different domain.
  3. Access the privileged notebook: The attacker generates a pre-signed URL using the sagemaker:CreatePresignedNotebookInstanceUrl API call to access the new notebook.

Detailed Attack Steps

2a. Enumerate IAM Roles

A typical attack begins with a reconnaissance stage, where the attacker enumerates IAM roles from the notebook in the original domain. By examining the AmazonSageMakerFullAccess policy, specifically the AllowAWSServiceActions statement, it's clear that the policy allows listing every role in the account.

IAM policy statement in AmazonSageMakerFullAccess
Figure 12: IAM policy statement in AmazonSageMakerFullAccess

The attacker can then use the iam:ListRoles permission to enumerate the IAM roles in the target AWS account, searching for lateral movement paths. Specifically, the attacker will look for roles with a trust relationship to sagemaker.amazonaws.com to advance the attack.

2b. Create a New (Privileged) Notebook Instance

Let's dive into AmazonSageMakerFullAccess policy and look at the iam:PassRole permission. The following statement allows attaching any role (that can be assumed by SageMaker) in the account to any SageMaker resource (notebooks, training pipelines, model end points).

Lambda pass role statement taken from AmazonSageMakerFullAccess
Figure 13: Lambda pass role statement taken from AmazonSageMakerFullAccess
Policy statement from AmazonSageMakerFullAccess
Figure 14: Policy statement from AmazonSageMakerFullAccess

What's more, the attacker can move laterally across SageMaker using the default AmazonSageMakerFullAccess with the following statement:

AllowAllNonAdminSageMakerActions is a tricky statement, since it does not allow the attacker to perform any action in the domain, app, user profile, space or flow definition. But this exclusion does not apply to notebook instances.

To carry out the attack, the attacker needs to use the default permissions they have through the AmazonSageMakerFullAccess policy to move to another IAM role:

1) sagemaker:CreatePresignedNotebookInstanceUrl (from AllowAllNonAdminSageMakerActions

2) sagemaker:CreateNotebookInstance (from AllowAllNonAdminSageMakerActions)

3) iam:PassRole (from AllowPassRoletoSageMaker)

Create a new notebook instance with a new IAM role (note the --role-arn parameter).

Figure 15: AWS CLI command to create a new notebook instance named privesc-notebook with SageMaker-ExecutionRole-20240521T120348 IAM role

2c. Access the Privileged Notebook

After the notebook is created, it can be accessed via a predefined URL generated via the CreatePresignedNotebookInstanceUrl API call.

Attack Vector 3: SageMaker Pipeline Integration with Lambda + Databricks Use Case

Attack vector diagram with Databricks role
Figure 16: Attack vector diagram with Databricks role

 

Attack Vector Overview & Impact

SageMaker Studio allows running model-building pipelines and provides the AmazonSageMakerPipelinesIntegrations policy to interact with AWS services. This policy can be exploited to create and invoke Lambda functions with elevated privileges, potentially leading to a full account takeover.

An AWS account that uses this policy can be subjected to lateral movement steps that enable a full account takeover.

Attack Flow Summary

The lateral movement attack starts from a notebook in domain A.

  1. Create a Malicious Lambda Function: Use lambda:CreateFunction to create a Lambda function with a high-privilege IAM role.
  2. Invoke the Lambda Function: Execute the function to perform unauthorized actions.

3a. Create a SageMaker Lambda

Policy statement for Lambda functions from AmazonSageMakerPipelinesIntegrations
Figure 17: Policy statement for Lambda functions from AmazonSageMakerPipelinesIntegrations

The policy allows any Lambda to be created (as long as it includes ‘sagemaker’ in its name), and to pass any IAM role that trusts Lambda, EMR or EC2 (!) This, of course, allows an attacker to pass roles that were not originally intended to be used by SageMaker.

Figure 18: AWS CLI command that creates a new function from a ZIP file from within the notebook

3b. Invoke the Lambda

After the Lambda is created, an attacker can invoke the new Lambda by simply running

The Lambda uses the attached IAM role when it is being invoked.

Databricks Use Case

When onboarding Databricks to an AWS account, a CloudFormation template creates relevant IAM roles and resources to allow Databricks to function. This template grants Databricks certain permissions to the AWS account along with associated cloud resources. One of the resources created is a Lambda function.

Databricks Lambda function created after running onboarding CloudFormation template
Figure 19: Databricks Lambda function created after running onboarding CloudFormation template

The Lambda function gets a bearer token as an input from an event. Depending on the event, it creates resources in Databricks (aka the customer environment). One of the interesting things is the IAM role that the Lambda uses.

AM role created after onboarding
Figure 20: IAM role created after onboarding

Trust relationship of the IAM role

Trust relationship policy for Databricks-created IAM role allows it to be passed to any Lambda
Figure 21: Trust relationship policy for Databricks-created IAM role allows it to be passed to any Lambda

and attached policies.

Showcasing IAM full access policy for Databricks IAM role
Figure 22: Showcasing IAM full access policy for Databricks IAM role

This shows that every Lambda can assume this role, which has an administrative privilege of full IAM access. An attacker using this attack vector can create a new Lambda that assumes this role.

This is an account takeover attack, since the attacker has full IAM privileges and can create a privileged user.

The following code is a Lambda function that creates a new role named 'EvilRoleAccountTakeover' with arn:aws:iam::XXXXXXXX:root in a trust relationship, meaning anyone in the account can assume it.h

Demonstration of account takeover using Databricks role from SageMaker notebook
Figure 23: Demonstration of account takeover using Databricks role from SageMaker notebook

Running the above python code from a JupyterLab notebook in SageMaker (with the AmazonSageMakerPipelinesIntegrations policy assigned) enables an instant account takeover and the gaining of admin privileges from the notebook itself.

Attack Vector 4: Steal Cross-Domain Canvas Secrets

Background

Amazon SageMaker Canvas is a no-code solution for developing models, data preparation, and more. Canvas also allows data ingestion from multiple AWS and third-party resources.

SageMaker Canvas stores data source connection details as secrets in AWS Secrets Manager (figure 29) with resource-based policies restricting access.

Showcasing different Canvas integrations with data sources
Figure 24: Showcasing different Canvas integrations with data sources
Canvas connection strings stored in Secrets Manager
Figure 25: Canvas connection strings stored in Secrets Manager

An attacker can list and retrieve these secrets used across all the domains without any restrictions. This is possible via the secretsmanager:GetSecretValue permission in the AmazonSageMakerFullAccess policy and the extent of Canvas’ ability to restrict cross-domain access to secrets in the Secrets Manager.

An attacker can leak those secrets and gain access to new services (e.g., Databricks, Snowflake, Salesforce) or to data sources that contain sensitive information.

Attack Flow Summary

  1. List secrets: Use secretsmanager:ListSecrets to identify SageMaker-related secrets.
  2. Retrieve secret values: Exploit secretsmanager:GetSecretValue permissions to access stored credentials.

4a. List Secrets

The secretsmanager:ListSecrets permission is in the AllowAWSServiceActions statement, which enables listing secrets without resource restrictions.

In AllowSecretManagerActions we can get secret values to secrets that start with "AmazonSageMaker-".

Policy statement from AmazonSageMakerFullAccess
Figure 26: Policy statement from AmazonSageMakerFullAccess

4b. Get SageMaker Secrets

Each secret has a resource IAM policy that enables only the Canvas application domain to access it.

Resource policy from Canvas-stored secret
Figure 27: Resource policy from Canvas-stored secret

The policy statement above enables retrieval of the secret only for the execution role tied to the domain and the Canvas application.

The AllowAWSServiceActions statement has secretsmanager:GetSecretValue permission, so retrieving the secret values can be a combination of:

  1. Calling the GetSecretValue API on found secrets from step 4a.
  2. Using attack vector number 2 to jump to the domain with the secrets

Mitigation Strategies

  • Implement strict IAM policies for Secrets Manager access
  • Use resource-based policies to limit access to secrets
  • Regularly rotate credentials and monitor secret access

Attack Vector 5: AWS Resource Group Modification

Background

SageMaker Studio allows you to group and organize registered models into model collections across SageMaker environments. To use this feature, AWS requires a specific custom IAM policy attached to the current execution role or the use of the AmazonSageMakerModelRegistryFullAccess managed policy.

Model collection option from SageMaker Studio
Figure 28: Model collection option from SageMaker Studio

The policy grants permission to the resource group service. The AWS Resource Group enables grouping certain resources under a unified group and the management of multiple resources with a single group. This is commonly used in automations, CloudFormation templates, AWS System Manager, and more.

Let's take a closer look into the policy:

Policy statement from AmazonSageMakerModelRegistryFullAccess
Figure 29: Policy statement from AmazonSageMakerModelRegistryFullAccess

The policy statements in figure 29 permit users to create new resource groups or tag existing ones with the sagemaker:collection tag. They also allow the deletion of any resource group that carries this tag.

This setup poses a security risk. An attacker could tag non-SageMaker resource groups with sagemaker:collection, bringing them under the policy's scope. This means they could delete or alter these groups – even if they have nothing to do with SageMaker.

The core issue is that the policy checks only aws:TagKeys but doesn't validate aws:RequestTag or aws:ResourceTag. Without these additional checks, the policy lacks the precision to ensure that only intended resources are affected. Implementing more granular controls – such as verifying the actual tags being applied or the tags on the resources themselves – can enhance security and prevent unauthorized modifications.

Attack Flow Summary

  1. Tag existing AWS Resource Group with sagemaker:collection
  2. Delete/modify the resource group

The attacker can then do one of the following:

    • Delete the tagged resource group
    • Disclose resources under the resource group
    • Delete and create a resource group under different resources

Mitigation Strategies

  • Adjust IAM policies to include specific conditions and resource restrictions
  • Employ tagging best practices to prevent unauthorized modifications
  • Monitor changes to resource groups

Message from Amazon Team

Amazon provides prescriptive guidance for all aspects of operational excellence, including the peculiar aspects of machine learning. When followed, vulnerabilities described here can be circumvented. Start your journey with the Machine Learning Lens, Amazon’s well-architected guide for end-to-end machine learning. The foundation models of generative AI have unique needs that are covered in Amazon’s best practices for LLM Ops. Start with the Generative AI Security Scoping Matrix. When necessary, contact an AWS GenAI Security Maven who is continually trained to inform customers of up-to-date best practices.

This blog was also shared with our friends at Databricks and we appreciate their comments.

Mitigating SageMaker Security Risks

The attack vectors outlined in this blog underscore the importance of securing every aspect of the Amazon SageMaker environment. As AI and machine learning become increasingly integrated into business operations, safeguarding data and managing permissions meticulously are crucial. Misconfigurations and overprivileged access can lead to severe security breaches, making it imperative to stay vigilant. When using SageMaker, ensure that IAM roles are properly scoped, regularly reviewed, and audited, and stay informed about the latest security practices to protect your AI pipelines and sensitive data from potential threats.

Protecting your SageMaker environment requires a multifaceted approach that encompasses best practices in cloud security, AI model management and continuous monitoring. Here are some essential steps to safeguard your AI pipelines.

  • Review your SageMaker environment and the permissions in every domain.
  • The "Set up for organizations" provides a range of restricted options based on the user persona's actual needs, including AWSSecurityAuditors, AWSServiceCatalogAdmins and AWSSecurityAuditPowerUsers. Choosing these options at the outset, as well as customizing least privileges, is an operational excellence best practice. That practice is not described here. Rather, the "Set up for single user (Quick setup)" is described, which is not recommended for enterprise production use.
  • Use the Prisma Cloud Code to Cloud security platform to ensure the security and safety of your environment and clients.
  • Keep up to date with the latest security updates from AWS.

Learn More

Want to learn more about what Prisma Cloud can do? Book a personalized demo.

 


Subscribe to Cloud Native Security Blogs!

Sign up to receive must-read articles, Playbooks of the Week, new feature announcements, and more.