Engineering
Going on a Powertrip
Ashley Mathew
Product & Engineering
If you have a product and customers, then you also have members of your team who need access to critical systems in order for your company to function. Safeguarding credentials that can access these systems via mechanisms like 2FA, U2F, and key rotation is necessary but not sufficient. You must also monitor for key security events and review each to ensure your protections are working as intended.
Here at Aptible, we’ve solved the problem of monitoring and requesting approval for security events via a lightweight Slack integration we built called Powertrip. With Powertrip, we are able to send Slack notifications to relevant team members about key security events within minutes of the event happening.
To get a better picture of how this works, let’s review how Aptible tackled monitoring SSH login activity using Powertrip.
Monitor for logins
The first step is setting up monitoring for the logins. This can take two forms:
Querying a source for login activity
Receiving login activity from a source via webhook
At Aptible we use both approaches across our security program, but for alerts that benefit from being as real time as possible we’ve opted to rely on webhooks where possible. There are a few reasons for this:
Webhooks are supported by most of our data sources, like our logging provider and Amazon CloudWatch
Relying on webhooks makes it far easier to determine what data we have processed and what data still need to be processed
Most sources that support webhooks also have logs available, meaning that we can easily use those logs to review and process any missed events, if needed
Our monitoring needs to provide us with enough information to successfully determine who triggered an event, when the event happened, and what the event actually was. For SSH logins, this means knowing who logged in, which server they logged into, and when the login happened.
OpenSSH logs a message for each successful SSH connection. Using webhooks offered by our logging provider, our application can be notified about each successful connection logged by OpenSSH. The notification sent by our logging provider looks something like this:
Conveniently, this log message contains much of the contextual information we need to create a useful notification - in a standard format - that’s easy to parse. With one simple regex, we now know who logged in, and from what IP address:
Our logging provider also provides some extra contextual information, including the timestamp of the event and the hostname of the server that logged the message, so that we can answer the major questions the log line can’t: which server was logged into and when.
By setting up one webhook and using one regex, we now have a useful security event! We can now tell that oswaldo logged in from IP 10.0.0.1 to the server bastion.test at 1:02 UTC.
Notify someone about the login
Now that we know an event happened, we need to tell someone about the event. For SSH events, it probably makes the most sense to send a private message directly to the Slack user associated with the login event, since hopefully that person knows whether or not the SSH activity is legitimate. To do this, we keep a map of unix username to the associated Slack user.
Events can’t always be tied back to a specific person, however. In this example, oswaldo is a bot, not a person. That said, oswaldo is a bot with a wide range of access, so it’s still important to monitor oswaldo’s activity. Since there’s not a single person we can DM to approve oswaldo’s activity, we instead push the Slack notification to a channel monitored by our entire security team.
We don’t want to assume that just because we sent a message via DM or to a channel that someone saw the notification. Instead, we require approval of each notification using Slack's message buttons feature.
And when that doesn’t work, notify again
Sometimes, one notification isn’t enough. Maybe oswaldo has a task to run at 1 AM, when the entire security team is asleep, or a person’s account is compromised while they are on vacation. In these cases, it’s even more important that the notifications aren’t missed. This is why we periodically re-send Slack notifications for any events that haven’t been approved. When re-sending notifications, we always send to the channel monitored by the security team, even if we can map activity to a specific user. This allows us to not miss malicious activity just because someone is on vacation.
One monitoring source isn’t always enough
With the few easy steps above, Powertrip sends us Slack notifications for all SSH activity across all of our critical systems, which is a huge boost to our security program. But all of this falls apart if we receive a log message Powertrip can’t process or Slack is down. It’s difficult to cover every possible failure condition, but in areas where the risk is high, it’s useful to have a backup plan.
For SSH activity, Aptible’s Host Intrusion Detection System (HIDS) also records every successful login. We track approvals that happen through Powertrip, and any approvals recorded by Powertrip are auto-approved in HIDS. Having the additional monitoring from HIDS means that if for some reason a Slack notification is missed, a member of our security team will still see and sign off on the SSH activity as part of the regular HIDS event review process.
Monitor anything
Using Powertrip to track and approve SSH logins is easy to set up, is easy for team members to approve activity, and is a huge boost to our security program. Since it’s so easy, why stop with SSH activity? The same approach will work for monitoring a wide range of other activities, including CloudTrail events, Aptible Operations, and anything you can write a custom log line for.