SharePoint 2016 has been infused with some of the data protection capabilities that already exist in Exchange 2013 and Office 365. Office 365 provides these capabilities for email through the Data Loss Prevention (DLP) in Exchange, Outlook, and OWA. In SharePoint 2016, you’ll use the index to find existing sensitive content (content that is deemed sensitive in one way or another, such as social security numbers or credit card numbers) in a variety of repositories, including SharePoint sites and One Drive for Business (ODB).
Finding sensitive information is a combination of pattern matching and proximity scanning. The proximity scanning looks at the content around the patterned data and analyzes the surrounding content to corroborate that the pattern is, indeed, a sensitive information type.
SharePoint 2016 will ship with 51 different sensitive information types, including an ABA routing number, various passport numbers, driver’s license numbers, ID card numbers, bank numbers, and US social security numbers. Before reading further, I suggest you take a look at these information types – I could have listed them here, but there was little value in doing so. Just click the link and familiarize yourself with these information types, then read the rest of this post.
The sensitive data is exposed through an eDiscovery site – not through the normal query in the search center, so I’ll walk you through how to surface sensitive data in an eDiscovery site in SharePoint 2016.
The first step is to create an eDiscovery site. This can be any new site collection in which you choose the eDiscovery template from the Create Site Collection page in Central Administration.
After the site is created, you’ll want to navigate to that site and click the Create New Case button (not illustrated) to create a new eDiscovery case. Once you have created a new case, you’ll be presented with the default home screen for that subsite in the eDiscovery site collection.
To conduct my test, I decided to see if SharePoint could find social security numbers (hence, “US SSN”) in two different locations: a Word document in a team site and a microblog. So, I created a team site called Team (http://team) and created a Word document with various fictitious US Social Security numbers, such as 012-34-5678. I also create a microblog on the team’s newsfeed input box and entered a different (fictitious) social security number there as well.
I then ensured that the sites were indexed properly by entering other information and successfully querying the index for that information. I also setup a second eDiscovery site and ensured that the same content could be found through the common keyword, such as “security”.
My Word document looked like this:
My newsfeed (microblogs) looked like this:
I was able to query the index from an Enterprise Search Center on the word “security” and get microblogs and the Word document in the result set:
To setup the eDiscovery site correctly, I navigated to that US SSN (United States Social Security Number) site and created a new eDiscovery set by clicking New Item. I populated that page as illustrated here:
Note that the filter for sensitive data is provided by Microsoft and is hard coded into SharePoint 2016 such that when a user enters the proper syntax, the eDiscovery center will find content that matches the pattern of the syntax, in this case “SensitiveType=”U.S. Social Security Number (SSN)””. Spelling and spacing are important here. If there is an error either way, the filter will not return any results.
Note that to find the needed information in a newsfeed, I tried to query the user’s My Site/One Drive for Business site. Because the newsfeed data is held in the social database and not a site collection’s content database, using eDiscovery to find newsfeed data is not going to work. (If someone has a way to get this to work, please reply to this post and let us know how this is done).
Once the filter is in-place, you’ll execute the filter by clicking the Get Statistics button. Once clicked, it will filter the index and give you an ellipse link to click on (you can barely see the three blue dots just below the Items column heading in this screen shot):
Click on the Ellipse link and you’ll get the number of content items that match the filter. Click on the Preview Results button at the bottom of the page (not illustrated) and you’ll get the list of items that match the filter:
Setting this up for Search and Export is slightly different. In this action, you’re committing a query as opposed to a filter (a filter asks a yes/no question and is used for field that contain exact data or are an exact match whereas a query asks How well does this content item match – queries calculate relevance and report gradations of comparisons whereas filters do not). Your query string is the same as that of the filter, in this case, “SensitiveType=”U.S. Social Security Number (SSN)””. You’ll execute the query and receive a set of content items that match the query. Because these Sensitive Type queries are pattern matches, they will function more like a filter than a query. Populate your screen as illustrated here:
After entering the Sensitive Type query, if you click on the Advanced Query Options, you’ll see a more robust version of the query that the system is using to find the information you’re after:
Note: the SharePoint query uses a Boolean Operator to combine the query and the path. You’ll see in other literature from Microsoft that Boolean Operators can be used to combine multiple Sensitive Type queries as well.
Click the Search button to see the results. Assuming you’ve entered your information correctly and the index is correctly functioning, you’ll see a Query Statistics section appear indicating how many content items match the query and if the SharePoint tab is in focus (bottom part of the screen), you’ll see a list of the content items that match the query.
Sensitive information types will only kick-in when there is proximity data associated with it. The proximity date is defined within the expression entered into the query or filter input box. For example, I created a document with nothing more than a single social security number contained with it. That document could be found via the normal search processes, but was not surfaced in my eDiscovery center because even though the content matched the patter, there was no proximity data associated with it. I then entered the word “social” into the document along with 12 other words and re-crawled the content. The document still didn’t appear within the eDiscovery center.
Only when I entered “social security” or another part of the exact phrase did the document appear. Word order mattered as well: “security social” was not enough as proximity values to have the document considered a sensitive data type.
This means that not all information that matches the pattern defined in the information type will appear as sensitive information. The proximity data will need to appear with it.
One final note: while the SensitiveType query will work in the eDiscovery site, it will not work in a regular search query, for example, in the Enterprise Search Center.
Summit 7 Systems