Managing Secrets

blog-image

Managing secrets and key material is something almost every organization struggles with. Whether it’s passwords, SSH keys or certificates, chances are you have needed to securely load or use these secrets in your applications. For those of us that don’t have large budgets or timelines, chances are that you also needed to get a new feature or service out as quickly as possible. We wanted to build a system that wasn’t detrimental to development timelines but offered enough security that we’d feel confident in the design. The Hakken Continuous Monitoring system is composed of many microservices, lambdas and EC2 instances. We wanted a design that would be able to work in all these conditions. This post will cover the threats, and solutions to this not-so-unique challenge. As an additional bonus, two packages were created to help organizations get a headstart into removing hardcoded keys and secrets. You can find the go package here and the python package here.

Threats

Before we bother designing a secure system for managing our secrets, we need to look at the threats our systems may face. Many of you have probably asked the question: ‘If someone compromises the application, they will get the secret anyways, so why not just pass it in environment variables or store it on disk?’ While some information security practitioners will scream and yell that it should never be stored on disk, rarely do they offer solutions that match the actual threat. Some may end up recommending rather elaborate systems to protect secrets. These same people most likely also never actually tried to implement the solutions themselves. Some may say use Hashicorp’s Vault or a similar system. This misses the point that vault requires a token (a secret!) which has to be stored… somewhere.

An excellent talk was given on this subject by the folks over at Netflix. But let’s face it, not all of us have Netflix’s budget. Going back to the question about secrets, the real answer to this problem is auditability. If someone steals a secret from the disk and uses it, you may not have a clear method of auditing when it was used. Storing them in environment variables is worse, because now it’s in two places, either on disk and in the processes environment output, or on the system that launched the process. These environment variables must come from somewhere.

However, if we store secrets in a system that logs access when key material is requested, we have a much better view into how and when our secrets are being used. Visibility is key. Preventing access is most likely going to be impossible. Even if you rotate your keys, chances are an attacker is on the system that can access the key material. They’ll be able to access the key when it is rotated. Securely managing secrets is not just about theft prevention, it’s about visibility and alerting.

Design

So now we know what the threat is: someone steals our secrets and uses it without us knowing. How do we design our systems to handle this threat? For AWS we have an amazing set of systems, IAM policies, AWS SSM (KMS), and CloudTrail. Hashicorp’s Vault would also work in non-AWS environments except you still need to figure out how to audit access to the tokens. But for Hakken, we are based in AWS, so we will focus on using SSM, custom resources for CloudFormation, and CloudTrail.

Hakken is primarily composed of microservices. We have analysis modules which, for the most part don’t require access to secrets since they just receive data over gRPC. Data services – services which talk to the database – need the username and password for the database. Each data service has it’s own database user, restricted to exactly the table that it needs to write to or reference. These services need the database connection string on service start up but only then. This is the data we want to protect and audit access for.

Alternatives

It should be noted that this system was developed before AWS had released Secrets Manager. If your organization is just starting to centralize secrets management, we’d recommend you investigate it as well since it offers additional features around secrets rotation. However, most of this post is still applicable depending on either secrets store you choose.

AWS Systems Manager (SSM)

If you run AWS services and you aren’t aware of SSM we’d strongly urge you to look at Parameter Store. This service allows us to store arbitrary data, encrypted with a KMS. Parameter Store is a key/value (K/V) store where all access to parameters are logged in CloudTrail. This is what gives us the visibility we need. If you wanted to store a password to a system you could just write it to a key such as /env/system/password. Any time something accesses that key, you’ll have a log of the access.

Custom Resources

Custom resources are lambda functions CloudFormation can call to do arbitrary computation during resource creation, updates or deletion. Hakken uses binxio’s cfn-secret-provider to generate random passwords and SSH keys, storing the result in the SSM Parameter Store. This allows us to generate, encrypt, and store secrets in a well-known location per environment.


  EventServiceDBPassword:
    Type: Custom::Secret
    Properties:
      Name: /xxx/${EnvironmentName}/db/eventservice/password
      KeyAlias: alias/aws/ssm
      Alphabet: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
      Length: 30
      ReturnSecret: false
      ServiceToken: !Sub 'arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:binxio-cfn-secret-provider'

Using the above resource, we now have a randomly generated database user password stored in Parameter Store.

IAM Policies

IAM policies, while notoriously difficult to configure properly, configuring them for SSM is straightforward. Parameter Store’s keys are a resource which can be restricted by an IAM policy. This allows us to define a very strict IAM policy stating microservice X may only read the key which is defined as the resource /env/system/password. No other service would be granted access to this K/V. This gives us two things: Auditability and Least Privilege. If microservice Y was compromised, they would not have the policy applied to allow it to read the key for microservice X. This gives us separation of privileges. We use CloudFormation exclusively in Hakken. So we can have reproducible infrastructure, almost nothing is defined or used outside of CloudFormation templates. Let’s look at a microservice IAM Role definition:


  EventServiceReadSSMRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: "ecs-tasks.amazonaws.com"
          Action: ['sts:AssumeRole']
      Path: /
      Policies:
      - PolicyName: !Sub "${EnvironmentName}-eventservice-query-dbstring"
        PolicyDocument:
          Statement:
          - Effect: Allow
            Action:
              - "ssm:Describe*"
              - "ssm:Get*"
              - "ssm:List*"
            Resource: !Sub "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/linkai/${EnvironmentName}/db/eventservice/dbstring"

In the above role you can see this service only has one policy, granting it access to the parameter to the database string it needs for the environment it is running in. At this point you may be wondering, how does our application access this? We will answer that in the next section.

Centralized Secrets Manager

Application developers should only be given a single package allowing them to access secrets. This ensures uniformity among all services. Let’s define an interface:


package secrets

type Secrets interface {
	GetSecureParameter(key string) ([]byte, error)
	SetSecureParameter(key, value string) error
}

Only two methods, getting and setting. Let’s look at the AWS implementation of this interface:


type AWSSecrets struct {
	Region  string
	sess    *session.Session
	manager *ssm.SSM
}

// NewAWSSecrets returns an instance with optional region specified, otherwise uses us-east-1
func NewAWSSecrets(region string) *AWSSecrets {
	if region == "" {
		region = "us-east-1"
	}
	s := &AWSSecrets{Region: region}
	s.sess = session.Must(session.NewSession(&aws.Config{Region: aws.String(s.Region)}))
	s.manager = ssm.New(s.sess)
	return s
}

// GetSecureParameter retrieves the parameter specified by key, or error otherwise.
func (s *AWSSecrets) GetSecureParameter(key string) ([]byte, error) {
	decrypt := true
	parameter := &ssm.GetParameterInput{
		Name:           &key,
		WithDecryption: &decrypt,
	}
	out, err := s.manager.GetParameter(parameter)
	if err != nil {
		return nil, err
	}

	return []byte(*out.Parameter.Value), nil
}
...

Our AWSSecrets implementation ensures we have a valid AWS session that comes from the IAM policy we defined for this container/task. The service calls GetParameter and tells it to decrypt the results. A helper type is built on top of this implementation to handle different environments and implementations of the interface:


func NewSecretsCache(env, region string) *SecretsCache {
	s := &SecretsCache{Environment: env, Region: region}
	if s.Environment != "local" {
		s.secrets = NewAWSSecrets(region)
	} else {
		s.secrets = NewEnvSecrets()
	}
	return s
}

// GetSecureString allows caller to provide the full key to return a string value
func (s *SecretsCache) GetSecureString(key string) (string, error) {
	data, err := s.secrets.GetSecureParameter(key)
	if err != nil {
		return "", err
	}
	return string(data), nil
}

// DBString returns the database connection string for the environment and service
func (s *SecretsCache) DBString(serviceKey string) (string, error) {
	key := fmt.Sprintf("/linkai/%s/db/%s/dbstring", s.Environment, serviceKey)
	data, err := s.secrets.GetSecureParameter(key)
	if err != nil {
		return "", err
	}
	return string(data), nil
}

Now we can use this SecretsCache in multiple environments defined by passing in the environment and optional region. For local testing we can set environment variables instead of using SSM. Here’s the EnvSecrets implementation:


// NewEnvSecrets returns an instance
func NewEnvSecrets() *EnvSecrets {
	return &EnvSecrets{}
}

// GetSecureParameter retrieves the env variable specified by key, or error otherwise.
func (s *EnvSecrets) GetSecureParameter(key string) ([]byte, error) {
	key = strings.Replace(key, "/", "_", -1)
	data := os.Getenv(key)
	return []byte(data), nil
}

The EnvSecrets implementation replaces / with _ and accesses the secrets via environment variables. Developers set variables such as: set _am_local_db_eventservice_dbstring=XXX in their shell environment and it works seamlessly to access local database secrets for testing.

Alerting

Now that we have an auditable trail for access of secrets, how should we alert? This depends heavily on your organization’s alerting capabilities and how you want to alert. You may want to alert on a combination of factors: different IAM users attempting to access, different locations, or even based on time of usage.

One possible solution is to monitor the cloudtrail event history, logging the ECS task event SubmitTaskStateChange and ensure it coincides with when Decrypt events are called for that key. You could track when instances were last started, then alert if any Decrypt events are found outside of that window.


  cfg, err := external.LoadDefaultAWSConfig()
  if err != nil {
    panic("unable to load SDK config, " + err.Error())
  }

  svc := cloudtrail.New(cfg)

  event := cloudtrail.LookupAttribute{
    AttributeKey:   cloudtrail.LookupAttributeKeyEventName,
    AttributeValue: aws.String("SubmitTaskStateChange"),
  }

  input := &cloudtrail.LookupEventsInput{
    EndTime:          aws.Time(time.Now()),
    LookupAttributes: []cloudtrail.LookupAttribute{event},
    MaxResults:       aws.Int64(50),
    NextToken:        nil,
    StartTime:        aws.Time(time.Now().Add(-5 * time.Minute)),
  }

  req := svc.LookupEventsRequest(input)
  resp, err := req.Send()
  if err != nil {
    log.Fatalf("failed to send event lookup request: %v\n", err)
  }
  
  //... store time ...

  event := cloudtrail.LookupAttribute{
    AttributeKey:   cloudtrail.LookupAttributeKeyEventName,
    AttributeValue: aws.String("Decrypt"),
  }

  input := &cloudtrail.LookupEventsInput{
    EndTime:          aws.Time(time.Now()),
    LookupAttributes: []cloudtrail.LookupAttribute{event},
    MaxResults:       aws.Int64(50),
    NextToken:        nil,
    StartTime:        aws.Time(time.Now().Add(-5 * time.Minute)),
  }

  req := svc.LookupEventsRequest(input)
  resp, err := req.Send()
  if err != nil {
    log.Fatalf("failed to send event lookup request: %v\n", err)
  }
  
  //... compare ...

Monitoring could be its own service, or simply a lambda that is called every few minutes, handling alerts in anyway your organization wants.

Conclusion

We hope this post helped developers understand /why/ they should not store secrets locally (auditing, and theft) and provided a concrete solution that can be used in their own AWS environments. We also hope this post shows that We Take Security Seriously™ here at linkai.

If you are interested in the Hakken service to continuously discover new web assets, open ports and dependencies on your externally facing systems, please contact us as we are actively looking for beta testers.