Best Practices for Monitoring Oracle Database | Network Wrangler

As administrators, the responsibility for keeping a database running smoothly extends to satisfying not only the demands of users but also the requirements of senior management and the enterprise as a whole.

In an Oracle environment, this requires setting up an infrastructure for monitoring the system that hinges on the objectives laid down by the business, and which also meets the conditions of often strict Service Level Agreements (SLAs) for the user community. This guide should help.

An Overview of Oracle

An Oracle database consists of a collection of operating system files which hold metadata: information from applications or input by users, and structural data about the system itself.

In order for users or applications to access, view, and update information in the database, Oracle must initiate a set of background processes, and allocate the memory which will be used during database operations. An instance is the name given to each associated background process and memory allocation.

If you wish to read or write information from or to the database, an instance must be initiated. But the presence of a database isn’t necessary, if you want to run an instance. Oracle’s Real Application Cluster (RAC) is a feature whereby a single physical database can host multiple instances, concurrently.

Oracle is a relational database, consisting of information organised in tables of rows and columns. In these tables, data may be stored, updated, and retrieved. Automatic Storage Management or ASM is a custom-built vertical integration of Oracle’s file system and volume manager. Database files can be added or removed from ASM disk storage, one ASM disk at a time.

The Oracle database server is the software which governs the database. It’s made up of logical and physical structures where user, system and control information are stored. The Oracle database system is the name collectively given to the software which runs Oracle and its physical database.

The Need for Monitoring

An alert log is generated every time an Oracle instance is run. Each log is a sequential stream of text messages about various Oracle operations such as database start-up and shut-down, and major events like defining tablespaces and archiving logs. Certain types of errors are also catalogued.

oracle_alert

Database administrators (DBAs) must keep an eye on Oracle operations if the performance demands of database users are to be continuously met. Different working environments will impose different demands, to which the monitoring of database performance must adapt.

For example, some clients may require that a set percentage of transactions per second be completed by an Oracle database. Others may require database response times to be measured to coincide with set periods of a working day, or that database performance be proactively monitored, in real time.

All these requirements may have to be met, to comply with the terms of a Service Level Agreement. And it’s up to the DBA to map out which performance aspects to measure, and how best to monitor them.

Developing a Strategy

First, determine the time frame. Real-time data will have to be collected for a reactive monitoring approach, one which can manage crisis situations as they occur.

An ongoing series of database “snapshots” may need to be taken as the basis of a predictive approach, where proactive measures can be put in place to ensure system performance based on a global view of how instances are handled in the course of normal operations.

Your Oracle database is only one part of a network infrastructure, so external factors of the system as a whole will affect it. Processor power, disk access speeds and network availability issues all have an impact, and these will need to monitored as well.

It’s easy to get lost in the multitude of statistics that Oracle provides – some of which are undocumented. You’ll need to decide which metrics to monitor to suit the needs of your enterprise, and which to provide reports for.

As an adjunct to this, you’ll have to determine how detailed or granular is the appropriate level for measuring performance statistics and system health. Often, this will be a balancing act between obtaining enough data for a statistically viable sample and minimising the impact on overall system performance that the monitoring itself will impose.

The Oracle Tool-kit

The Automated Workload Repository or AWR collects snapshots at hourly intervals over a time period determined by the database administrator. It’s built into the Oracle kernel, and imposes little overhead on your database operations during its monitoring cycle. AWR is a proactive monitoring tool, for the longer term.

The Automated Session History(ASH) is an Oracle mechanism for collecting data in tandem with the AWR. Automated Session History keeps a highly detailed log of performance information at short periods (30 minutes or 1 hour, typically) – again, as laid down by the DBA. ASH represents a short-term monitoring solution for reactive strategies.

Predictive analysis of data provided by ASH and AWR may help predict fluctuations in system performance, and future outages.

Oracle’s v$ performance views (also called dynamic performance views or tables) are established on Oracle’s x$ fixed tables, and continuously provide performance metrics and internal statistics. v$ data and metrics begin at start-up, and are best deployed over a set time period, giving repeat samples at intervals.

Oracle’s Automatic Database Diagnostic Monitor (ADDM) is a self-diagnostic engine built into the database. It’s set to run proactively by default, but may be configured as a reactive monitoring tool. ADDM provides impact and benefit analysis, identifies non-problem areas, and can help resolve past and current problems.

If you’re willing to spend a little extra, Oracle Enterprise Manager is the add-on monitoring suite of choice. Its Configuration Pack, Tuning Pack, and Performance Pack are available for Oracle deployments of version 10g and later.

Some Additional Tools

There are numerous third-party tools on the market.

Confio Ignite is a real-time monitor for Oracle which identifies performance issues without imposing a significant load on your database system, while doing so.

Proactive tools like Ion-DBA and StatsPackAnalyzer (which is freeware) use data gathered by Oracle’s own AWR and STATSPACK to identify issues – again, without imposing a load on your production environment.

Buy or DIY?

There may be circumstances in which Oracle’s native tool-kit or third-party software aren’t enough – or cost too much. At this point, building a custom-made monitoring architecture is the database administrator’s only option.

The do-it-yourself option may also come into play if the enterprise or a client wishes to have metrics that would not be monitored under a more “conventional” set-up.

If you do decide to self-build, be sure to provide documentation to support the approach you’ve taken.

With a self-built monitoring structure, you’ll have the benefits of customisation, and a system capable of being enhanced or altered to suit changing conditions. However, this approach may be time-consuming to construct, and requires an intimate knowledge of the inner workings of Oracle.

Top Things to Monitor

Oracle’s Listener function is essential to maintaining the availability of your database. It tracks new connections which are made to the database, as they occur – and if it goes down, then access to the database is lost. If it’s available at operating system level, the Listener may be monitored by checking for TNS process errors. In addition, the “lsnrctl” command line option reveals the status of the Listener, and the databases it’s listening out for. The resulting log may be used in resolving access refusals and connectivity issues.

The “ORA-00020: maximum number of processes” error can cause user logins to hang. If you need to log in yourself to do monitoring, this error can be especially tough to pin down. Your safest bet is to establish a performance threshold – say, 80% of the maximum. A monitoring alert at this level will give you time to establish what’s happening, and pre-emptively shut down some processes so that users aren’t completely locked out of the database.

Use the Alert Log or v$flash_recovery_area_usage to monitor your archive log space and recovery area, both of which may cause system hanging if storage limits are exceeded. A threshold of 80-85% for the archive log will give leeway to back up the archives and remove them to free up storage space. For the recovery area, Oracle itself sets a lower threshold at 85%, with 97% being the critical level.

The size of your Oracle filesystem and the way its tablespaces are distributed should be monitored using thresholds based on acceptable percentages of full. With automatic extending of the tablespaces, these can grow very large – at which point low threshold percentages may actually indicate large amounts of storage space available. A combination of size and percentage threshold monitoring is advised, which may be implemented through simple shell scripts.

At a more basic level, be sure to verify the availability of your database, by logging in and performing a simple select operation. Assuming that process, tablespace or archive log issues aren’t occurring, you can delve further by checking to make sure that smon, pmon and other processes are up and running.

A Check-list for Database Administrators

· Ensure that each Oracle instance is monitored for performance and file integrity.

oracle_checklist

· Monitor the System Global Area (SGA) – the shared memory structures holding collective data and control information for multiple users concurrently accessing a single Oracle database instance.

· Use Automatic Database Diagnostic Monitor (ADDM) to get a global view and diagnostics of database performance.

· Use Automated Session History (ASH) for a more targeted analysis of database performance.

· Do a baseline analysis with the Automated Workload Repository (AWR) to compare performance under varying conditions.

· Real-time SQL Monitoring should be used to determine how each SQL instance is being executed at global, plan, and parallel execution levels.

· The SQL Performance Analyser (SPA) should be used on SQL workloads, to refresh their statistics with the PENDING option. The SQL Tuning Advisor will help optimise performance.

· Temporary Tablespaces should be managed locally. A single Temporary Tablespace should be reserved for the entire RAC database.

· Use local file headers to manage space, and reduce file fragmentation.

· Permanent Tablespaces should be handled using Automatic Segment Space Management, and Auto-Allocate.

· Remember to gather statistics for both user and dictionary objects. Statistics for optimisation should be gathered using automatic statistics collection.

· Build up a knowledge base, from your observations and actions in the monitoring “field”.

Be Accountable

Provide reports and documentation to support your monitoring approach. Have a valid reason for each monitoring action you take – and be prepared to defend it, to anyone who asks about it. When reporting to clients and stakeholders, be sure to specify exactly what’s being monitored – and what isn’t.

And be flexible. If management or a client wants to add something to your list of monitoring targets, be prepared to work it in.

Kerry Butters

Kerry is a published author and writer on all things tech, corporate tech, data centres, SEO, webdesign & more for some of the world’s leading sites.