Feedback is most welcome, and additions as well as corrections can be expected. Since I've got some sprint work that need some focus this is published as is and I'll get back to it in when I can and will try address any comments as well.
Overview
Those not familiar with/interested in the internals of IdM queue processing can consider looking at the first part: On queue processing, or the lack thereof. Part #1
I may update this with the matching screens from the Admin UI over time.
And as always remove "with (nolock)" from the queries when running not running on SQL Server.
I haven't had time to test these on Oracle but leaving this part in will guarantee a failure, and the case syntax is a bit differnt in Oracle. This is one of the first things I want to get back to.
Getting an overview of the queues
One of the most important things to do in case of a productive stand-still or issue is to get an overview of what's in the different queues.
Link evaluations, approvals and workflows have separate queues and processing of them is done by different threads in the dispatcher(s).
Jobs are simply set to state=1 and scheduletime < now in the mc_jobs table.
Jobs and actions
As mentioned above, jobs do not really have a queue. They are scheduled to run by having scheduletime set and state set to 1. The dispatcher will start runtime(s) to process jobs if the mc_dispatcher_check procedure returns 1 or more standard jobs to run. The java runtime will use the procedure mc_getx_job to reserve a job from the available jobs. Once running the state in mc_jobs changes to 2.
Just to clarify, ajob sits outside the Identity Store(s) in a job folder and usually works with bulk processing and contains 1 or many passes. Actions are inside the workflow of an Identity Store and can only contain 1 pass and process 1 entry at a time. To slightly confuse the matter, the configuration of an action task is a job, in the mc_jobs table, and the logs it creates are stored in the mc_logs table. There's a link between the task in mxp_tasks to mc_jobs on mxp_tasks.jobguid = mc_jobs.jobgui.
With this knowledge a query listing jobs and provisioning actions that are running can look like this:
select name,case when provision=1 then 'Action' else 'Job' end type, CurrentEntry, Current_Machine from mc_jobs with(nolock) where state = 2
This produces output like this:
Note that the CurrentEntry column in mc_jobs is updated every 100 entry, or every 30 seconds by the runtimes.
The provisioning queue & semaphores
The provisioning queue is based on the table mxp_provision. To process parts of the queue a dispatcher must first set a semaphore that indicates that other dispatchers should keep away from processing the same type of task. This is done by setting a semaphore (basically its own Id as owner along with a timestamp) in the mc_semaphore table. The timestamp is updated as the thread is processing entries, and a semaphore whose timestamp is older than 300 seconds is considered dead. This means that if you have conditional statements taking a very long time to run so that the dispatcher thread is not able to update the timestamp within 300 seconds the semaphore is released and another dispatcher will start processing conditional statements as well. That means trouble because the two threads risk running the same conditional mskey,action,audit combination!
The provisioning queue is divided into views according to the threads in the dispatcher: mxpv_grouptasks_ordered/mxpv_grouptasks_unordered/mxpv_grouptasks_conditional/mxpv_grouptasks_switch/mxpv_grouptasks_approval/
mxpv_grouptasks_attestation
These views will at most contain 1000 entries from a Top 1000 limiter. As mentioned in part #1, actions that are to be processed by runtime engines are picked up by a procedure and has no view.
The link evaluation queue
This queue contains assignments that need to be evaluated. Any mxi_link entry with mcCheckLink < now is in this queue. This includes role/privilege assignments and entry references such as manager.
The dispatcher processes this from the view mxpv_links. This view will contain 0 entries in normal situations, up to 1000 under load. To get the real number of links that need evaluations you can run:
SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1))
To see if a specific user has privileges that are queued for evaluation, or if a privilege has entries where it's state is still to be evaluated:
-- Assignments to evaluate for 'User Tony Zarlenga' SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1)) and mcThisMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'User Tony Zarlenga') -- User assignments to evaluate for privilege 'PRIV.WITH.APPROVAL' SELECT count(mcUniqueId) FROM mxi_link WHERE (mcCheckLink < getdate()) AND (mcLinkState IN (0,1)) and mcOtherMskey in (select mcmskey from idmv_entry_simple where mcMskeyValue = 'PRIV.WITH.APPROVAL')
Listing actions ready to be run by runtime engines
Runtime actions are listed in the provisioning queue with actiontype=0. Combined with state=2 (ready to run) and exectime < now the entry is ready to be processed by a runtime. A very basic query listing number of entries ready for processing by different actions is:
select count(P.mskey) numEntries,P.actionid, t.taskname from mxp_provision P with(NOLOCK), mxp_tasks T with(NOLOCK) where P.ActionType=0 and T.taskid = P.ActionID group by p.ActionID,t.taskname
Unless you have a lot of actions with delay before start configured actions will usually have an exectime in the past. This query will create a simple result showing the entries that can be processed by runtimes:
Listing actions ready to be run by runtime engines and the state of the job
In most cases this is only part of the full picture. You really want to know if a runtime is actually working on those entries as well. Let's add mc_jobs and mc_job_state to the query to get a bit more detail:
select count(P.mskey) numEntries,P.actionid, t.taskname,js.name as jobState from mxp_provision P inner join mxp_tasks T on T.taskid = P.ActionID left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid left outer join mc_job_state JS with(nolock) on j.State = JS.State where P.ActionType=0 and P.state=2 and T.taskid = P.ActionID group by p.ActionID,t.taskname,js.name
The current reason for my system not processing anything is getting clearer:
No actions are running so something is blocking or stopping the runtimes from starting and I know to look at the dispatcher. Since I've manually stopped it it's no big surprise and troubleshooting is simple.
Just a few of the actions/jobs are running
If you think that not enough runtimes are being started and see situations like this:
You should look at item 5 in the checklist below and also have a look at the properties and policy of the dispatcher
Dispatcher properties and policies
By increasing the Max concurrent rt engines setting to 10, and max rt engines to start to 3 the situation is quickly changed to the point where it's difficult to create a screenshot:
Troubleshooting actions/jobs not starting
A quick checklist for troubleshooting:
- Check that the dispatcher process (mxservice.exe on windows) is running
- Check how many java processes you have in task manager (or using pf or similar on Unix)
- If there are no or just a few java processes then the runtimes are most likely not started by the dispatcher
- Check prelog.log for startup issues
- If you have lots of java processes but no actions running, then the runtime is probably having problems connecting to the db or reserving a job
- Check prelog.log
- If there are no or just a few java processes then the runtimes are most likely not started by the dispatcher
- Check that the dispatchers are allowed to run the jobs that are queued
- A job listed with state ERROR will not run, and has to be forced to restart. Check it's logs for errors though, they end up error state for a reason (most of the time)
- Check the database activity monitor, reports or using queries from IDM SQL Basics #2: Locating problem queries to check if
- If the procedure mc_dispatcher_check is running for a long time the dispatcher is unable to perform the check on the queue to see how many actions are ready for processing and java runtimes will not be started.
- If the procedure mxp_getx_provision is running for a long time the result is many java processes in the system but they are unable to allocate jobs
Listing the number of items and their state in the queue
I would usually start off with the following query that lists the number of entries per task, per state and including the state of the linked Job for action tasks:
select count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName, ISNULL(JS.Name,'not an action task') JobState from mxp_provision P with(nolock) inner join mxp_Tasks T with(nolock) on T.taskid = P.actionid inner join mxp_state S with(nolock) on S.StatID = P.state inner join MXP_ActionType A with(nolock) on A.ActType=P.ActionType left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid left outer join mc_job_state JS with(nolock) on j.State = JS.State group by t.taskid,T.taskname, A.name, S.name, JS.Name order by A.name, S.name
I've started my dispatcher test task described in Part #1 for 1000 entries The query above gives me a result like this during the processing
(click to enlarge) | A quick explanation of some of the type/state combinations and what would process them
Action Task/Ready To Run: Action that is ready to be processed by a runtime + JobStatus: The state of the job linked to the Action Task. If it's Idle it means a runtime has not picked this up yet.
Conditional, Switch and (un)Ordered Tasks are processed by dispatchers that have a policy that allows Handle Tasks. Ready to run for conditional or switch task means i'ts ready for evaluation Ready to run for Ordered/Unorderd task means the workflow can be expanded into the queue Expanded OK means the workflow at this level is expanded Waiting generally means that its waiting for a sub-process or child event to finish |
The final view of the provisioning queue, with current entry count for actions
Since the mc_jobs table contains a column named CurrentEntry we can also see how many entries running actions have processed using:
select count(P.mskey) numEntries,t.taskid,t.taskname,A.Name ActionType,s.name StateName, case when js.name = 'Running' then 'Running, processed:'+cast(ISNULL(J.CurrentEntry,0) as varchar) else js.name end state from mxp_provision P with(nolock) inner join mxp_Tasks T with(nolock) on T.taskid = P.actionid inner join mxp_state S with(nolock) on S.StatID = P.state inner join MXP_ActionType A with(nolock) on A.ActType=P.ActionType left outer join mc_jobs J with(nolock) on J.JobGuid = T.JobGuid left outer join mc_job_state JS with(nolock) on j.State = JS.State group by t.taskid,T.taskname, A.name, S.name, case when js.name = 'Running' then 'Running, processed:'+cast(ISNULL(J.CurrentEntry,0) as varchar) else js.name end order by A.name, S.name
The result is quite useful as it's now possible to see how many entries the actions that are running have processed so far (click to see):
This will have to do for revision one. If I get time to add more this week I will, but there is patches and SPs to work on as well.