What a mighty doozy this one was… a couple weeks, a few hairs and 7.4 panic attacks later, I think I’ve had one of the most twisted SharePoint issues I’ve ever had to deal with. In my 6 years of working with SharePoint I’ve only had to open up a support case with Microsoft one other time. I take great pride in being able to solve stuff on my own but this was just one of those that had me going in circles. Hopefully this write up will help others in the future.

Environment

  • SharePoint 2010 Enterpise December 2011 Cumulative Update 14.0.6114.5000
  • 3 Load Balanced WFEs
  • 2 Application Servers
  • 1 User Profile Service Application
  • User Profile Service and User Profile Synchronization Service both running on APP1 server
  • User Profile Service Application has 4 Custom User Profile Properties and 1 Property set to export to Active Directory (Picture to thumbnailPhoto)

Since there were quite a few variables in troubleshooting the problems (which I haven’t mentioned yet), I’ll outline all the happenings in a timeline format.

February 2012: Users start to use My Sites to upload their pictures
We’re synching these pictures to Active Directory so that they can be re-used for the users’ Lync and Outlook profile pictures.

February 11, 2012: Deployed December 2011 Cumulative Update
This is a story for another day but in the end everything turned out OK. Lessons learned here: be wary of synchronizations of the User Profile Service after troubleshooting UPA issues after deploying CU. Make sure you disable the “My Site Cleanup Job” and follow the guidance from Joanne Klein here. I learned this the hard way.

February 16, 2012: Installed some Windows Updates
Everything seemed to be normal here.

February 22, 2012: First Reports of Users Not Seeing Their Pictures in Lync & Outlook
Upon inspection of Active Directory and comparing to pictures in SharePoint, there was indeed a mismatch. Photos has not been exported to AD since February 17, 2012. Uh oh, could it have been the Windows Update? Maybe a weekly Timer Job somewhere that regressed from the December 2011 CU? Maybe a combination of both? Errors reported in the FIM MIIS Client and Event Viewer pasted below:

FIMSynchronizationService Event ID 6126

The management agent “MOSS-63649d6d-ab5f-4eda-8c6a-6e2b65a419c7” completed run profile “MOSS_DELTAIMPORT_51f827d4-b836-4851-89de-daf209327762” with a delta import or delta synchronization step type. The rules configuration has changed since the last full import or full synchronization.

User Action
To ensure the updated rules are applied to all objects, a run with step type of full import and full synchronization should be completed.

FIMSynchronizationService Event ID 6801

The extensible extension returned an unsupported error.
The stack trace is:

“System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
at System.Net.WebClient.DownloadData(Uri address)
at Microsoft.Office.Server.UserProfiles.ManagementAgent.ProfileImportExportExtension.DownloadPictures(ProfileChangeData[] profiles)
at Microsoft.Office.Server.UserProfiles.ManagementAgent.ProfileImportExportExtension.Microsoft.MetadirectoryServices.IMAExtensibleFileImport.GenerateImportFile(String fileName, String connectTo, String user, String password, ConfigParameterCollection configParameters, Boolean fFullImport, TypeDescriptionCollection types, String& customData)
Forefront Identity Manager 4.0.2450.34″

After trying to troubleshoot this for a whole day with no leads, I threw in the towel and phoned MSFT support.

February 23-27, 2012: Working with MSFT SharePoint Support Engineer Trying to Restore the UPA
In short, we worked on trying to create new UPAs in production as well as staging with restored Profile and Social Databases to no avail. There was one restore scenario that seemed to work when we left it on the Friday, but when I came in on Monday, the sync errors were happening with the new UPA as well. I also learned that there are quite a bit of URLs that are hard coded into the User Profile and Social Databases. For example, when we restored the databases to the QA environment, users in the QA environment were getting redirected to their Production My Sites. I once again thew in the towel with this Support Engineer when he suggested that there will be no way to recover the User Profile and Social DBs and that his seniors recommend that all of my users will have to recreate all of their profile information. This was completely unacceptable. By the way, if you find yourself on a support call with MSFT or anyone else for that matter, don’t always be so willing to do whatever they recommend. There were quite a few instances where I had to disagree with a troubleshooting step as it would have made my production environment unavailable to my users or result in data loss. So try to use your own best judgement and common sense when working with someone that doesn’t have to deal with your end users.

February 28-March 1, 2012: Waiting Around for Call from Escalation Engineer… But Had Some Revelations
Was supposed to hear back from someone after 48 hours, but did not. Instead, I had to go through some other routes to get some attention via my Technical Account Manager (TAM). In the meantime, I was sleuthing around in regards to the 404 error message and discovered that there was something awry with some of the user profile pictures. I recorded this finding/bug here. I didn’t have a chance to validate that the FIM Sync errors are related to the bad CMYK pictures, but that was the hunch…

March 2, 2012: Got another Support Engineer and… success! Sort of…
We spent about 4 hours in total and eventually reached a semi-conclusion. So this whole entire time, a lot of attention was being paid to the UPA as that was the most probable cause for failed syncs with Active Directory. This time, instead of spending too much time trying to recreate and restore the UPA in various stages, I was able to change the troubleshooting direction to focus more on the user profile pictures instead. With this lead, the Support Engineer suggested that we remove the Picture mapping to Active Directory and then perform a Full Synchronization. Before running the Full Sync, I made mention that the last time I did this, all of  the profiles got deleted. After disabling the My Site Cleanup Job, we ran the Full Sync and were indeed able to observe that all the user profiles were marked for deletion using

Select * from userprofile_full (nolock) where bDeleted = 1

on the User Profile Database. That was pretty nerve-racking. We then proceeded to run a few more syncs to confirm that the user profiles were flipped back to the do not delete state.We also confirmed that there were no more sync errors. Woohoo! User profile pictures were definitely the problem, causing the FIM sync to fail.

Resolving the bad CMYK pictures problem
This seems to be a bug with SharePoint and the workaround I’ve found is to delete the offending thumbnail (large thumbnail) generated by SharePoint and then replace it with the medium thumbnail (which works). You can follow that thread here. After resolving the picture issue, I was then again able to successfully export all the user profile images from SharePoint to Active Directory. So in retrospect, if there was an exception in the sync because of one of these images, FIM will tap out and not even attempt to export any other pictures that are working to Active Directory.

Recap & Lessons Learned

  1. User Profile Service Application and FIM Sync issues do not always require a rebuild of the UPA.
  2. If you rebuild your Sync DB or Connection to Active Directory, you will lose all of your Profile Property Mappings.
  3. If you rebuild your Sync DB or Connection to Active Directory, your next sync (either incremental or full, first incremental will force a full) will result in all of your profiles getting marked for deletion.
  4. To prevent your User Profiles from getting deleted, disable the My Site Cleanup Timer Job.
  5. Don’t believe the Support Engineer when he says it’s not possible to restore the Profile and Social DBs (YMMV).
  6. Don’t perform recommended actions that may cause downtime or loss of data. Use your common sense and don’t jeopardize your users’ data.
  7. You know your environment the best and sometimes you have to go with your gut on an issue. Having a second pair of eyes and helpful suggestions was definitely appreciated but if I had let the Support Engineers continue their scripts, we would still be trying to recreate and restoring the UPA to no avail.
Advertisements