A Day in the Life of a Palo Alto Networks Administrator

Security is hard, and here is a great example as to why. I am using Amazon WorkSpaces for this example because I recently ran up against this problem, but a similar issue could happen during any cloud software product’s normal lifecycle.

Amazon WorkSpaces is a service that provides managed remote desktops based on Server 2012 or Server 2016. It uses Teradici’s PCoIP technology as the transport for the screens which gives it a competitive advantage over similar services. The close integration with other AWS services provides additional benefits. For example, an EC2 server can be on the same local network as a WorkSpace environment so that file shares and even older client/server-based applications can operate at high speed on a local network that costs you nothing in bandwidth. It is a great way for businesses to move legacy applications to the cloud.

In my lab environment, the AWS WorkSpace client suddenly stopped connecting to the cloud service, and the following is the sequence of events I used to troubleshoot what happened, determine where the problem was, and how I resolved it, all while maintaining SSL Decryption for as much traffic as I could. In other words, keeping security as a primary concern.

Issue: I can no longer connect to Amazon WorkSpace.

The Amazon WorkSpace client gave two different error messages when attempting to connect. The first error message stated that the client couldn’t start the instance. The second message stated that the service was unavailable. The client has a button in the lower right-hand corner next to the work “Network.” Clicking that button will bring up a screen with several levels of connectivity levels, making troubleshooting easier for those technically inclined. In order: Network Connection; Internet Connection; WorkSpaces service; TCP Port 4172; UDP Port 4172; Round Trip Time. The client was failing on WorkSpaces service.

Is the WorkSpace still up?

The first steps in most troubleshooting are usually the easiest to test for and typically fix a wide variety of problems. “Try rebooting.” Logging into the AWS Console shows that the WorkSpace is indeed powered on and reporting its status as Healthy. Selecting it and choosing “Reboot” gives us a 5 to 10 minute reboot cycle that historically fixes a wide range of various issues even if the instance is reporting healthy. After the reboot, the problem remains.

Can I connect from another machine?

I have to make sure there is nothing wrong with my local machine. Had this been in production I would likely have a few users to test with. I could have more easily known whether this was a global issue across the business or a local issue with one user and this may have been the most obvious thing to check. In my lab environment, given that WorkSpaces has problems more frequently than my MacBook Pro, this was the second most obvious issue. A reboot of the local machine made no difference in the client’s behavior. Checking for updates for the WorkSpaces client shows there is no newer version available. I used remote desktop to log into a local PC with the client and it experienced the same issue. Now I’ve validated that my Mac is not the problem.

Is something being blocked on the Palo Alto?

A quick look into the Monitor tab and Traffic log show that nothing unexpected is being blocked. Per best practice, I do have SSL Decryption enabled, and I do notice a lot of AWS traffic that is not identified as Amazon WorkSpaces that is being decrypted.

Does it work if I bypass the Palo Alto?

I enabled a VPN service that I typically use while on public networks so that I can bypass the Palo Alto. Once the session is established, the client connects just fine. I make the full connection to the WorkSpace just to make sure everything is fine there. Then I disable the VPN. Normally the WorkSpace client handles network transitions remarkably well, however this time the connection immediately dropped. The problem definitely seems to be pointing towards a network disruption between my lab network and the AWS WorkSpace. The next most obvious place to check is the firewall, although I have already checked that for blocking events, so it is something other than an active block.

Does it work if I turn off decryption?

Other than an active blocking, the most common thing that causes connectivity issues on Palo Alto is the use of decryption. There are an increasing number of services and applications that won’t tolerate a man-in-the-middle, even if it is sanctioned with a trusted private CA in the Root Certificate Store of the computer. Unfortunately, decryption is also one of the most important things to enable to get the full effectiveness of any next-gen firewall solution. I added my MacBook Pro to an address group on the Palo Alto that is not decrypted. I use this group for devices that I cannot, or do not want to go through the trouble to install the Palo Alto’s CA certificate on. After a commit the WorkSpaces client connects right away with no problems. At this point I have isolated the problem to the Palo Alto and specifically to SSL decryption. However, it is not acceptable to leave this machine in a state where SSL decryption is disabled. WorkSpaces has always connected in the past with SSL decryption enabled, so the evidence points to something having changed.

Did something change with Palo Alto?

I searched through the Palo Alto release notes for the latest dynamic updates and did not find any mention of Amazon WorkSpaces in the notes. I also checked the recent Configuration change and log on my Palo Alto to double check that I didn’t change anything recently myself that may have caused this issue.

Did something change with the Amazon WorkSpace client?

The Amazon WorkSpaces client is on a relatively rapid release schedule. I found the release notes for the latest Amazon WorkSpace client and see a vague reference to the Windows version of the client installer checking for and adding a Starfield root CA to the host machine if it doesn’t already exist there. This gives me the suspicion that something has changed with the WorkSpace client’s protocol to require that certificate, but the description is so vague that there is no way to know exactly what. In Palo Alto under Device -> Certificate Management -> Certificates -> Default Trusted Certificate Authorities, I check to see that there are some Starfield root CAs listed. I do not have enough details yet to know if these are the proper certificates, but there is no immediate concern that I might be missing a trusted CA that WorkSpaces expects to find. Palo Alto is configured to sign SSL traffic from untrusted sources with a different CA that the clients do not trust. In this way certificate errors can be passed through to the client. Otherwise, if it were not configured this way, bad certificates would be re-signed with the Palo Alto’s trusted CA and the clients would only see that the certificate was trusted and users would not be warned of any issues, such as expiration, with the original CA. While I am looking at the Default Trusted Certificate Authorities list however, I do notice right below those Starfield CAs some trusted StartCom CAs which have been deprecated by many browsers, which concerns me, but I must make a mental note to look at that later and not let myself be sidetracked.

Can I identify the WorkSpace traffic?

Palo Alto has an App-Id defined for WorkSpaces traffic. It also has an SSL Decryption Exclusion for it (connectivity.amazonworkspaces.com). My suspicion is that either the traffic changed enough that it is not being identified as WorkSpaces traffic with app-id, or it is using a new hostname that SSL Decryption Exclusion does not cover or both. I open the Monitor tab and start looking at traffic again. I have a filter on my Mac that has the client running on it and is continually going through the network checks. My hope is to be able to visually identify these connection attempts in the log. Very quickly I realize that there is no way I can distinguish one type of AWS traffic from another. So many programs and services on my laptop are using AWS that it is just a huge mess of entries. I need to isolate this further. By also filtering on App-Id identified traffic for WorkSpaces I can see that none of that is being decrypted, so I can exclude that traffic with a filter. By filtering only on decrypted traffic there are just too many things hitting AWS to be able to make any sense of it. It is a little surprising how many different applications use an AWS service to check for updates, phone home, or more. For example, I notice a Canadian based cloud service that I subscribe to hitting AWS resources in the United States. Another mental note – this is also concerning.

How can I identify the WorkSpace traffic?

So far I am not having much luck, but I must isolate this traffic so that I don’t target all AWS traffic that does not belong to the WorkSpace client with a Decryption exclusion. There is a great little program called Little Snitch for Mac that is a personal firewall with easy and granular controls that can be set based on many different factors and can block both incoming and outgoing traffic. Using this software I deny traffic for every service and application on my laptop except for traffic destined for the local network or going to/from the WorkSpaces client application. Now when I look on the Palo Alto Monitor tab filtered only on my Macbook Pro I can see a repeating pattern of App-Id’d WorkSpace traffic coupled with an SSL request on port 443 to an IP address in Amazon’s range. As luck would have it, it is the same IP address on each request.

SSL Decryption exclusions by IP?

I set up a Decryption Policy (Policies -> Decryption) that excludes decryption for the IP address I am seeing. Before the commit has finished, however I notice that the client is now trying another IP address. I add this one as well, and the process repeats. While continuing to observe I noticed four different IP addresses it was rotating through. I was hoping that there would be a single endpoint or two specific to our WorkSpace instance, but this does not appear to be the case. With all four IP addresses in the Decryption Policy set to not encrypt the client is working perfectly on both my local machines. I connect and disconnect multiple times to make sure. While I could leave the configuration as-is and exclude all four of these IP addresses, there is no way to ensure that these four IP addresses will always be the same, and I will probably need to revisit this issue if I were to leave it this way. I need to solve this a better way.

All target IPs observed have the same SSL cert name.

Using a web browser, I made a https request to each of the four IP addresses I have observed. Inspecting the certificate on each shows the same DNS name of skylight-cm.us-east-1.amazonaws.com. Under Device -> Certificate Management -> SSL Decryption Exclusion I add this hostname and set it to Exclude from decryption. I also removed the four IP addresses I had previously excluded from decryption. After a commit the WorkSpaces client still works without issue, but I am no longer tied to just those four IP addresses. Amazon can (and they most certainly will) move those IP addresses all around and my configuration won’t be bothered so long as the DNS name of the certificate is consistent.

Success!

If you arrived at this article because you are trying to solve a similar problem, please be aware that the DNS name above will not be the same for your AWS WorkSpace instance if you are in a different region. Now that I have the name it is using, I was able to search for it and found the following article in the AWS documentation. Please see the article to determine your domain: https://docs.aws.amazon.com/workspaces/latest/adminguide/workspaces-port-requirements.html

Lessons Learned and Follow-up

A search for the hostname I excluded shows that this DNS has been in existence since at least 2015. Since the WorkSpace client has worked with the same Palo Alto configuration including SSL decryption in my environment for over a year, I am pretty sure that the client was recently changed to use an internally pinned certificate rather than trusting the system’s root certificate store.

My troubleshooting process is about as efficient as I could expect. From start to finish the entire exercise took a little less than two hours. it took longer to write this article. A faster commit cycle on my lab Palo Alto might have sped things up as would having Little Snitch preconfigured to be able to quickly set up this kind of experiment. Had I not had a Mac I would have to figure out an equivalent method on a Windows machine. Or I might have used tcpdump or Wireshark or Process Explorer to identify the traffic locally rather than using the firewall.

In an ideal world, products like WorkSpaces would convey a protocol change prior to pushing it out, or provide a beta program where such an issue could be identified prior to it being pushed into production. However, even if this were done I can imagine that this issue still might have slipped through the cracks until it showed itself in production. More than anything else, however, I got the impression that I needed to share this experience.

What I want to convey is that doing security right is hard. The easiest and fastest solution here would have been to disable SSL Decryption completely, or to make a sweeping exclusion for a wide range of AWS IP addresses. Indeed, if this had happened in a production environment I would have been forced into that solution at least temporarily because two hours of unplanned downtime is an hour and 59 minutes too long. That is also assuming that when the problem initially started that I was available to immediately begin the process of troubleshooting and fixing it. That I wasn’t in the middle of troubleshooting and fixing something else, or otherwise indisposed.

Multiply this problem by every application your organization uses and by every time one of those programs does an update either to its client or protocol and you begin to see the extent of effort that goes into doing security right. SSL Decryption is a best practice in Palo Alto and any other “next generation” firewall, however the administrative burden of running it goes well beyond just pushing out the CA certificate to your environment and enabling the feature. Most of our customers choose to not enable SSL decryption because of this administrative burden. Only the larger customers have the staff trained and available to deal with an issue like this.

Since I also enjoy thinking like an attacker, there are some deeper issues that come to mind that a true adversary could leverage when you are forced to bypass SSL decryption like I was in this example. According to AWS documentation, every WorkSpaces instance in us-east-1 gets the same DNS name for the server in this exception. Also, the WorkSpace traffic itself is not inspected because it cannot be decrypted. The combination of this means that there is no way to determine if WorkSpaces traffic is solicited or not with the current rules in place. For $35/month, anyone behind my firewall could create their own instance of WorkSpaces in the us-east-1 region and use that instance to exfiltrate data from my network or introduce something malicious. The traffic on port 4172 goes to random ec2 IPs, so it isn’t even possible to create a list of sanctioned IP addresses for this, so there is really no way to determine from the firewall whether a WorkSpaces connection is sanctioned or not. There are probably some ways around this with additional network configuration on the AWS side that I haven’t investigated. I hope, anyway.

When cloud services allow some of their traffic to be decrypted, it allows the administrator to have some visibility into what is happening. For example, Palo Alto makes it possible to distinguish between Microsoft Office business and personal accounts because Microsoft does not make it impossible to decrypt their traffic. This enables you to block personal OneDrive accounts while still allowing business OneDrive. That will solve most casual employee abuse of policy that requires data to be stored only on the business account. For $5/month for Office 365 Business Essentials, anyone can create their own single-user Microsoft Office business account complete with 1TB of space and connect to this from behind your firewall. Or Outlook Web Access with the same account for personal webmail.

Although you do not have the ability to quickly and easily differentiate between your business account and the attacker’s business account, such inspection is possible. Decryption also allows you to see and inspect the files and data that are transmitted. In this way you can still inspect, alert on, or block sensitive information leaving (or malicious information entering) your network.