Tuesday, 28 January 2014

Search stops working and CA Search screens error

Problem: In my redundant Search farm, search falls over.  It was working, nothing appears to of changed but it suddenly stops working.  This problem has caused itself to surface in several places:
1.> In  CA going to "Search Administration" displays the following error message: Search Application Topology - Unable to retrieve topology component health states. This may be because the admin component is not up and running.


2.> Query and crawl stopped working.
3.> Using PowerShell I can't get the status of the Search Service Application
PS> $srchSSA = Get-SPEnterpriseSearchServiceApplication
PS> Get-SPEnterpriseSearchStatus -SearchApplication $srchSSA
Error: Get-SPEnterpriseSearchStatus : Failed to connect to system manager. SystemManagerLocations: net.tcp://sp2013-srch2/CD8E71/AdminComponent2/Management
4.> In the "Search Administration" page within CA, if I click "Content Sources" I get the error message: Sorry, something went wrong
The search application 'ef5552-7c93-4555-89ed-cd8f1555a96b' on server SP2013-SRCH2 did not finish loading. View the event logs on the affected server for more information.

I used PowerShell to get the ULS logs for the correlation Id returned on the screen via the CA error message
PS> Merge-SPLogFile -Path "d:\error.log" -Correlation "ef109872-7c93-4e6c-89ed-cd8f14bda96b" Out shows..
Logging Correlation Data       Medium Name=Request (GET:http://sp2013-app1:2013/_admin/search/listcontentsources.aspx?appid=b555d269%255577%2D430d%2D80aa%2D30d55556dc57)
Authentication Authorization   Medium Non-OAuth request. IsAuthenticated=True, UserIdentityName=, ClaimsCount=0
Logging Correlation Data       Medium Site=/ 05555e9c-5555-555a-69e2-3f65555be9f4
Topology  Medium WcfSendRequest: RemoteAddress: 'https://sp2013-srch2:32844/8c2468a555594301abf555ac41a555b0/SearchAdmin.svc' Channel: 'Microsoft.Office.Server.Search.Administration.ISearchApplicationAdminWebService' Action: 'http://tempuri.org/ISearchApplicationAdminWebService/GetVersion' MessageId:
General    Medium Application error when access /_admin/search/listcontentsources.aspx, Error=The search application 'ef155572-7555-4e6c-89ed-cd8f14bda96b' on server SP2013-SRCH2 did not finish loading. View the event logs on the affected server for more information.  Server stack trace: at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version, FaultConverter faultConverter) at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc) at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Ob General  Medium ...(IMethodCallMessage methodCall, ProxyOperationRuntime operation)  at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)  Exception rethrown at [0]: at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.ErrorHandler(Object sender, EventArgs e) at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.OnError(EventArgs e) at System.Web.UI.Page.HandleError(Exception e) at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) at System.Web.UI.Page.ProcessRequest(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) 
General      Medium ...em.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) 05b5559c-4215-0555-69e2-3f656555e9f4 Runtime   tkau Unexpected System.ServiceModel.FaultException`1[[System.ServiceModel.ExceptionDetail, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]: The search application 'ef109555-7c93-444c-85555-cd8f14b5556b' on server SP2013-SRCH2 did not finish loading. View the event logs on the affected server for more information.   Server stack trace:      at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version, FaultConverter faultConverter)     at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc) 
Runtime     tkau Unexpected ...s, TimeSpan timeout) at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation) at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)  Exception rethrown at [0]: at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.ErrorHandler(Object sender, EventArgs e) at Microsoft.Office.Server.Search.Internal.UI.SearchCentralAdminPageBase.OnError(EventArgs e) at System.Web.UI.Page.HandleError(Exception e) at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
Runtime       tkau Unexpected ...ProcessRequest() at System.Web.UI.Page.ProcessRequest(HttpContext context) at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously
General    ajlz0 High Getting Error Message for Exception System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]: The search application 'ef10-555-a96b' on server SP2013-SRCH2 did not finish loading. View the event logs on the affected server for more information. (Fault Detail is equal to An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is: System.Runtime.InteropServices.COMException: The search application '7c93-555c-89ed-cd8f555da6b' on server SP2013-SRCH2 did not finish loading. View the event logs on the affected server for more information. at icrosoft.Office.Server.Search.Administration.SearchApi.RunOnServer[T]General        ajlz0 High ...   at Microsoft.Office.Server.Search.Administration.SearchApi..ctor(String applicationName)     at Microsoft.Office.Server.Search.Administration.SearchAdminWebServiceApplication.GetVersion()     at SyncInvokeGetVersion(Object , Object[] , Object[] )     at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)     at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc) at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMess...) Micro Trace uls4 Medium Micro Trace Tags: 0 nasq,5 agb9s,52 e5mc,21 8nca,0 tkau,0 ajlz0,0 aat87 05b16e9c-4215-00ba-69e2-3f656eabe9f4
Monitoring   b4ly Medium Leaving Monitored Scope (Request (GET:http://sp2013-app1:2013/_admin/search/listcontentsources.aspx?appid=be28d269%255577%2D430d%2D555a%2D30d55556dc57)). Execution Time=99.45349199409
Topology    e5mb Medium WcfReceiveRequest: LocalAddress: 'https://sp2013-srch2.demo.local:32844/8c2468a555943555bf42cac555f05b0/SearchAdmin.svc' Channel: 'System.ServiceModel.Channels.ServiceChannel' Action: 'http://tempuri.org/ISearchApplicationAdminWebService/GetVersion' MessageId: 'urn:uuid:c2d55565-bb59-4555-b34b-6555ef1d79a5'


I can see my Admin component on SP2013-SRCH2 is not working, so I turned off the machine hoping it would resolve to the other admin component, it did not change and keeps giving me the same log errors.  I turned the Server back on and reviewed the event log on the admin component Server (SP2013-SRCH2).  Following errors occured:
Application Server Administration job failed for service instance Microsoft.Office.Server.Search.Administration.SearchServiceInstance + Reason: The device is not ready. 
The Execute method of job definition Microsoft.Office.Server.Search.Administration.IndexingScheduleJobDefinition + The search application + on server SP2013-SRCH2 did not finish loading.

Examining the Windows application event logs on SP2013-SRCH2 and I noticed event log errors relating to permissions:
A database error occurred. Source: .Net SqlClient Data Provider Code: 229 occurred 0 time(s) Description:  Error ordinal: 1 Message: The EXECUTE permission was denied on the object 'proc_MSS_GetConfigurationProperty', database 'SP_Search'
Unable to read lease from database - SystemManager, System.Data.SqlClient.SqlException (0x80131904): The EXECUTE permission was denied on the object 'proc_MSS_GetLease', database 'SP_Search', schema 'dbo'.

The event log showed me the account trying to execute these stored procs.  It was my Search service account e.g. demo\sp_searchservice

Initial Hypothesis: It looks like the Admin search service is not starting or failing over.  By tracing the permissions I can see the demo\SP_SearchService account no longer has execute SP permissions on the SP_Search database. 

By opening the SP_Search database and looking at effective permissions I can see the "SPSearchDBadmin" role has "effective" permissions over the failing stored procs (Proc_Mss_GetConfigurationProperty).

If I look at the account calling the Stored Proc (demo\SP_SearchService), I can see it is not assigned to the role.  This examination leads me to my conjecture that the management tool or someone/something on the farm has caused the permissions to be changed

Resolution: Change the permissions on the database.  Give minimal permissions so go to the database and give the service account (demo\sp_SearchService) SPSearchDBAdmin role permissions.
All the issues recorded in the problem statement were working withing 5 minutes on my farm with the issue.

Note: I have a UAT environment in exact sync with my PR environment; UAT has the correct permissions in place already.

3 comments:

surfkid said...

Great description of the problem and the solution. I was just heading towards removing abnd rebuilding the whole search-configuration.
Thanks a lot, you made my day.

Stacy Simpkins said...

I followed steps to the letter, must be something else in my environment. Issue is still happening after giving account that runs IIS app pool permissions note above, and db_owner on search db's, and making sure same was true for farm account and for search access account.

also, my farm the content sources page opens and the content sources show stuck on starting to crawl. ....so not quite same anyways. May have to rebuild search service application, then redeploy, then recreate topology

Mikael Svenson said...

Wrote about one scenario which causes this: http://techmikael.blogspot.com/2014/10/caution-if-you-have-used.html

Post a Comment