Cleanly Restarting OID After A 9iAS 9.0.2 Crash
A problem that often seems to happen when Oracle 9iAS 9.0.2 crashes is that you can't seem to restart OID using OIDCTL.
For example, a situation might arise when a server is bounced without 9iAS being shut down cleanly. When you reboot the PC, and use DCMCTL to check the status of the OC4J instances prior to starting them, you get the following error message:
C:\ocs_onebox\infra\dcm\bin>dcmctl getState -V
ADMN-202026
A problem has occurred accessing the Oracle9iAS infrastructure database.
Base Exception:
oracle.ias.repository.schema.SchemaException:Unable to connect to Directory Server:javax.naming.CommunicationException: markr.plusconsultancy.co.
uk:4032 [Root exception is java.net.ConnectException: Connection refused: connect]
Please, refer to the base exception for resolution, or call Oracle support.
Anyone who administers 9iAS 9.0.2 boxes will know this error, which is due to the fact that OID is down and the DCMCTL utility not being able to function. Assuming the infrastructure database is actually up, this is often resolved by issuing the command;
C:\ocs_onebox\infra\bin>oidctl server=oidldapd configset=0 instance=1 start
which starts up an OID instance. However, sometimes this fails to work and you get the error message:
C:\ocs_onebox\infra\bin>oidctl server=oidldapd configset=0 instance=1 start
*** Instance Number already in use. ***
*** Please try a different Instance number. ***
What this is telling you is that there's already an OID process running (with an instance number of 1) and therefore you have to use a different instance number. One thing you can now do, is run OIDCTL again, and use a different instance number. However, there is now a 'stray' OID process hanging around (the instance number 1) and what you should really do is get rid of this stray process, and then use OIDCTL with the original process number of 1.
What you use the OIDCTL command, what actually happens behind the scenes is a row is inserted or updated in the ODS.ODS_PROCESS table that contains the instance name (which must be unique), the process ID, and a flag called 'state', which has three values - 0,1,2 and 3 which stand for stop, start, running and restart. A second process, OIDMON, polls the ODS.ODS_PROCESS table and when it finds a row with state=0, it reads the pid and stops the process. When it finds a state=1, oidmon starts a new process and updates pid with a new process id. With state=2, oidmon reads the pid, and checks that the process with the same pid is running. If it's not, oidmon starts a new process and updates the pid. Lastly, with state=3, oidmon reads the pid, stops the process, starts a new one and updates the pid accordingly. If oidmon can't start the server for some reason, it retries 10 times, and if still unsuccessful, it deletes the row from the ODS.ODS_PROCESS table. Therefore, OIDCTL only inserts or updates state information, and OIDMON reads rows from ODS.ODS_PROCESS, and performs specified tasks based on the value of the state column.
This all works fine except when 9iAS crashes; when this happens, OIDMON exits but the OIDLDAPD processes are not killed, and in addition, stray rows are often left in the ODS.ODS_PROCESS table that are detected when you try to restart the oidldapd instance after a reboot.
The way to properly deal with this is to take two steps.
- Kill any stray OIDLDAPD processes still running (if you haven't rebooted the server since the crash)
- Delete
any rows in the ODS.ODS_PROCESS table
connect to the IASDB database as the ODS user, or as SYSTEM
select * from ODS.ODS_PROCESS; (there should be at least one row)
delete form ODS.ODS_PROCESS;
commit;
- Restart the OID instance again, using
C:\ocs_onebox\infra\bin>oidctl server=oidldapd configset=0 instance=1 start
Some useful notes on this issue can be found on Metalink;