In the August 2015 cumulative update for SharePoint 2013 (and SharePoint 2016), Microsoft released a new, powerful capability: the Cloud Search Service Application (SSA). The Cloud SSA is a powerful new hybrid technology that finally delivers on what most hybrid search users were looking for: a unified index for unified results. In this mini series, I’m going to talk about two important limitations to the Cloud SSA that I’ve discovered recently. In this article, we’ll talk about a limitation with using alternative identity providers in your on-premises sites. In the second article, we’ll address a limitation with alternative URLs.
Beware: The following is going to get very technical. You’ve been warned! :)
Hybrid Search Review
Let’s do a little review of hybrid search in SharePoint.
Traditionally (wow – crazy to think of something only a few years old as “traditional”), hybrid searches have relied upon the concept of “query federation.” With query federation, the query component sends a query to the on-premises index but also to one or more external indexes. The external index can be pretty much anything which supports the Open Search protocol. In the Office 365 (O365) space, query federation was primarily done with SharePoint Online (SPO) in order to bring O365 results into an on-premises search experience (or on-premises results into an SPO search). This is now known as “classic hybrid.” (Again, strange that something so recent is now known as “classic”) The primary problem with classic hybrid search is that each query was separate from the other, meaning the results from each couldn’t be combined into a single set of results. Instead, two distinct sets of results were returned to the user. Although helpful, this wasn’t the experience that most people were looking for.
The Cloud SSA solves this issue by providing a single index in O365 which combines SPO, OneDrive for Business, and User Profile content with nearly anything that can be crawled by SharePoint. The Cloud SSA crawls whatever content it’s told to (SharePoint sites, file shares, web sites, legacy systems, etc.) and effectively sends it to O365 for indexing. Users can then do a search in SPO and receive a single, glorious set of results from a myriad of results sources, all ranked appropriately for the user. The crawled content also becomes available in Delve and (I imagine) the Office Graph. It’s really pretty cool tech. Terribly handy.
We’ve had several customers interested in the Cloud SSA lately, so we’ve been kicking the tires and learning as much as we can about the solution. Although it has worked great thus far, there are some important limitations that people need to be aware of. Some of these include: no content enrichment, no cross-site publishing, and no custom entity extraction. If these are required, then a traditional SSA will need to be deployed alongside the Cloud SSA. BTW, DON’T click the button to reset the index! If you do, you’ll need to call Microsoft Support and get them to reset your cloud index.
In addition to these, I’ve recently discovered two additional important limitations that have heretofore not been identified.
New Limitation: Windows Claims Only
In the last couple of months, I discovered what I believe is a pretty significant limitation to the Cloud SSA. It ends up that SAML identity providers (such as ADFS) are not supported with the Cloud SSA. Only Windows claims will be supported. I brought this to the attention of the mighty Neil Hodgkinson, who talked with the FAST team. They confirmed that SAML authentication (like ADFS) will not be supported with the Cloud SSA.
If I understand all of this correctly, the reason for this lies in the ACL mapping that’s part of the indexing of on-premises content. One of the things the O365 indexing service does is take the ACLs that are on the crawled items and map them to identities in Azure AD. If it finds a match, then great. If there isn’t one, then the ACL is disregarded. The O365 indexing service knows how to do this for Windows claims since they’re nice and consistent. However, all bets are off for non-Windows claims. The FAST team said that there are too many potential claim provider names and that it would be nearly impossible to map them all in an efficient manner.
Don’t believe me? Give this a try (although be warned that it’s a lot of work). On-premises, take a web application and add ADFS as an identity provider. This isn’t a small task, and now’s not the time to go into it. Sorry. Make sure that the Default zone is using NTLM (Search needs to crawl using NTLM in the Default zone). You’ll either need to add ADFS as a second authentication provider, or you’ll need to extend the web app into a new zone. Once that’s done, find some content in a site and grant permissions to a user from that ADFS identity provider (not Windows). Sign into the site with that ADFS user and verify that you can access the content. Kick off a crawl in the Cloud SSA and wait a couple of minutes to make sure the content gets indexed. Next, sign into SPO using that ADFS user and do a search for the content. You won’t be able to find it. Now, let’s prove that it works with the Windows claim. Grant permissions on some content to the Windows equivalent for that account. Alternatively, you can use a different account in order to see the two side-by-side. Kick off a crawl again and give it time for indexing. Sign in to SPO again using the account and search for the content again. You should now be able to see the item in the results.
So what’s going on here? The indexing service is failing to map the ADFS claim-encoded identity to the identity in Azure AD. When the Cloud SSA crawled the item, it saw the following as the claim-encoded identity: "i:0e.t|adfs|[email protected]" (“ADFS” was what we called the identity provider). The Cloud SSA is expecting the identity to look like this: “i:0#.f|membership|[email protected]”. Note the differences between the two. The Cloud SSA doesn’t know how to map "i:0e.t|adfs|[email protected]" to “i:0#.f|membership|[email protected]” and thus the index drops the ACL. This then leads to the item getting security-trimmed out of the search results.
The moral of this story is that you can only use the Cloud SSA if you’re using Window identities. If you’re using some alternative identity provider, then the Cloud SSA is not really going to be an option for you. You could potentially use it as long as you apply a Windows-equivalent ACL for every SAML claim – but nobody in their right mind would want to do that. “Ain’t nobody got time for that!” :)
If you’re interested in learning more about claim encoding, check out the following awesome blog on the matter by Wictor Wilén: http://www.wictorwilen.se/Post/How-Claims-encoding-works-in-SharePoint-2010.aspx.
Hopefully, understanding this limitation will help someone avoid a lot of headaches and wasted time in the future. It’s not fun to get excited about the Cloud SSA, sell the idea to your management and users, get approval, get funding, get the farm patched, and get the Cloud SSA deployed – only to find out that you can’t use it after all. The Cloud SSA is an awesome piece of tech. Just make sure your environment is a candidate before you get too far into it.