WHfB and Zscaler – Local error

In one of our recent projects, we encountered a puzzling challenge that left both Microsoft and our team scratching their heads. Join us as we dive into the specific environment details and unravel the complexities of Windows 11 22H2 Enterprise AAD, Intune management, White Glove Autopilot enrollment, and Windows Hello for Business with Cloud Trust.

Environmental Specifics

Windows 11 22H2 Enterprise AAD joined and fully managed by Intune
Devices enrolled using White Glove / Pre-provisioning Autopilot
Windows Hello for Business configured with Cloud Trust
Zscaler with ZIA and ZPA (Private tunnel with a line of sight to the Domain controllers)
Carve-out project with a greenfield deployment, including migration of Business applications and utilization of Kerberos authentications
Presence of DFS fileservers

The Issue: DFS Access and Authentication Challenges

During our project, we encountered a perplexing issue where users, while logging into their devices with Windows Hello for Business, faced difficulties accessing DFS local shares. They encountered incorrect password prompts, but after approximately 10 minutes, the access would start functioning correctly. This inconsistency affected users randomly.

Troubleshooting Journey

Analyzing TGT Tickets: Upon running the “klist tgt” command, we noticed that no tickets were being generated, which hinted at a potential underlying cause.
Failed Authentication: Local applications immediately after device startup or reboot failed to authenticate. However, after approximately 10 minutes, they began functioning normally.
Zscaler Investigation: Although we suspected Zscaler’s involvement, we couldn’t definitively pinpoint it as the root cause.
A Wild Guess: Considering the symptoms, we speculated that the TGT ticket might be getting corrupted during the issue and reissued after a specific interval, leading us to formulate a testing plan.

To test our thesis after the machine was rebooted or started we ran the below command as administrator:

klist purge_bind

Boom! It started to work.

This means when the client is getting the Kerberos ticket the ZPA is not fast enough to come up resulting in a failed ticket. The machine retries after 10 mins by default and this is set in the registry value of FarKdcTimeout.

After 10 mins when it retries the ZPA is already established and it gets a new ticket and all works.

Resolution

As there was no way to speed up the ZPA connection so we went with a solution of running list purge_bind after 10-sec post-user login as a scheduled task.

Leave a Reply Cancel reply