Case Study 1:
Database Health & Performance Check

1.Case Summary

1.1 Customer Profile

Customer Name: Large Australian Financial Institution

System: Oracle E-Business Suite

System Description:

A large Australian financial institution uses Oracle E-Business to manage the following operations:

•General Ledger

•Accounts Payable

•Human Resources\Payroll

1.2 Case Study Explanation

The application team responsible for supporting the Oracle E-Business finance system is preparing for the month end close processing. The period close cycle is governed by service level agreements and specific processing\reporting milestones need to be met. Failure to meet these milestones can result in consequences imposed by financial regulatory bodies. 

 

One of the pre-requisites of the month end closing cycle is to perform a complete application and database health\performance check. The team responsible for supporting the ORACLE E-Business database has been requested to perform a complete database health and performance check.

1.3 Required Actions\Deliverables

  1. Conduct the E-Business database health & performance check using the VITALS health check software.

  2. The scope of the heath & performance review is isolated to just the database. By using the health check functionality provided in Speedway identify areas of the database in need of attention. The criteria of information provided should focus on items which could cause performance and\or stability issues with the database during the month end close cycle.

  3. Recommend a high level course of action for remediating any areas in need of attention.

2.Result Summary

The following items were checked by the health & performance scan. It is recommended the appropriate action be taken as per below:

3.Running the Database Health & Performance Check

 

Step 1. Enter the login credentials to the database(SID is the alias from your tnsnames.ora)

  • Populate the SID, User and Pass fields and click the “Connect” button. Note that the SID should be an entry from your TNSNAMES.ORA file

  • Once connected select or deselect the items you wish to include in the health check.

  • Choose the correct source of information. This is done by selecting the “AWR” or “Statspack” radiobutton.

  • Click the “Run Health Check” button.

 

The health & performance process will cover 13 areas. By default every item in every area is included but can be excluded by un-checking the appropriate check box.

4.Verbose Result Analysis

The below verbose results are screenshots from all parts of the health & performance check. Each item shown below has commentary around performance advice and tuning. This information is available to the user when the information button  is pressed. There is an information button for every item in the results.

4.1 SQL Execution

The “SQL Execution” part of the health check will inspect the following areas of the database:

1. Execution Plan Flip Flops:

Execution plan flip flops occur due to underlying statistics of the object. In some cases the optimiser will generate a sub-optimal execution plan resulting in poor performance. Consider improving statistics or look at plan stability options such as SQL baseline\profiles.

2. PL/SQL Consumption:

Excessive CPU consumption can often result from poorly written PL/SQL. Look at improving PL/SQL code in the event  of high CPU consumption from PL/SQL code.

3. Uncommitted Transactions:

Uncommitted transactions acquire locking at the row and table level. Failure 'to commit or rollback transactions in a timely manner may cause subsequent 'issues for other database processes.

4. Long Operations:

Long operations are identified based on the amount of elapsed execution time. Consider tuning the underlying SQL of long operations.

5. SQL Hard Parses:

Hard parses result in additional resource consumption as the database accesses the data dictionary to confirm object and sql metadata. Consider investigating ways of reducing hard parsing by use of bind variables.

6. SQL Failed Parses:

Failed parses can result from invalid object names in queries. Start off by looking for invalid objects and orphaned synonyms.

4.2 Wait Events

The “Wait Event” part of the health check will inspect the following areas of the database:

7. Top Wait Events
Sort by "Total Waited" to find which wait events are consuming the most database time. Double click the wait event to find sessions attributed with the wait event.
8. Event\SQLID Breakdown
Sort by "Total Waited" to find which SQL ID is responsible for a particular wait event.
9. Buffer Busy Waits
Buffer busy waits are due to block contention. Oracles Active Session History has two fields that can be used to identify the reason for Buffer Busy Waits. Review DBA_HIST_ACTIVE_SESS_HISTORY.P3TEXT where event = "buffer busy waits"
Reason Codes in the p3text column elaborate on the root cause of the buffer busy wait.
10. Row Lock Waits
Row lock contention occurs when an update is unable to proceed due to another session update which has not yet been committed. Review application logic to 'avoid row update contention.
11. Trans Slot Waits
Consider increasing INITRANS setting if experiencing high transaction slot waits.

4.3 Top Consumers

The “Top Consumers” part of the health check will inspect the following areas of the database:

12. Top Physical I/O(By Object)
Excessive I/O can be reduced by various methods such as indexing, partitioning, KEEP_POOL, RESULT_POOL, SGA sizing.
13. Top Logical I/O(By Query)
Excessive I/O can be reduced by various methods such as indexing, partitioning, KEEP_POOL, RESULT_POOL, SGA sizing.
14. Top CPU(By Query)
Consider tuning queries using excessive CPU. Review all recommendations from the SQL tuning advisor.
15. Top Physical I/O(By Query)
Consider tuning queries using excessive I/O. Review all recommendations 'from the SQL tuning advisor.
16. Top Sorts(By Query)
Consider tuning queries using excessive sorts. Review all recommendations from the SQL tuning advisor.
17. Top Elapsed(By Query)
Consider tuning queries with excessive elapsed time. Review all recommendations from the SQL tuning advisor.
18. Top Table Scans(By Object)
Full table scans can result in excessive I/O and severely degrade performance. Try to reduce full table I/O scans by use of indexing and\or caching.
19. Top Table Scans(By Query)
Full table scans can result in excessive I/O and severely degrade performance. Try to reduce full table I/O scans by use of indexing and\or caching.

4.4 Database

The “Database” part of the health check will inspect the following areas of the database:

20.General Alerts
All database alerts should be analysed for the purpose of sustaining database integrity and performance.
21.Database High Water Marks
Peek resource consumption allows us to identify which parts of the database are subject to heavy utilization. This can resort in resource saturation and may require resource provisioning or tuning in specific areas.
22.Flashback Mode
Only enable flashback if there is a requirement to flashback to any previous point in time. If there is no requirement flashback will generate unnecessary I/O.
23.Force Logging Mode
FORCE LOGGING forces data changes to generate REDO regardless of any database objects declared as NOLOGGING. FORCE LOGGING is encouraged to guarantee all changes are synchronized between primary and standby when dataguard is in use.

4.5 Cluster

The “Cluster” part of the health check will inspect the following areas of the database:

24. Cluster Balance
The equal ratio\distribution of workload across all nodes in a cluster will help prevent resource depletion on any single node. Try and balance the workload across all nodes in the cluster.
25. Cluster Errors
There should be no errors with cluster communications. Any cluster errors need to be investigated from the database stack right through to the infrastructure\network stack.
26. Cluster Ping Statistics
Low latency communication between nodes in a cluster is important for performance. All average cluster latency between nodes should be less than 1 millisecond.

4.6 Indexes

The “Indexes” part of the health check will inspect the following areas of the database:

27. Invisible Indexes
Invisible indexes incur normal data maintenance during data manipulation but are not used for query operations. If the index in unintentionally marked as invisible either drop it or make it visible.
28. Tables with more than 5 indexes
Tables with more than 5 indexes will require more overhead than normal when data and associated indexes are updated. Try and reduce the number of indexes required on a table.
29. Tables with no indexes
Tables with no indexes are subject to full table scanning and excessive I/O. Investigate if the table and its usage will benefit from indexing.
30. Index Usage(Indexes not used in 60 days)
Indexes which are not used represent a performance overhead with data manipulation. Consider dropping indexes which are not used. Note that some indexes which are not used may be required to enforce data integrity rules(I.E: Primary keys).

4.7 REDO

The “REDO” part of the health check will inspect the following areas of the database:

31. Log Buffer Waits
Increase the size of the REDO log buffer if the average log buffer wait is less than 5000. There should next to no log buffer space waits. 
32. Log Buffer Latching
All miss ratios should be <= 1%. If greater than 1% then redo latch tuning is required.
33. Log Wait Times
If excessive AVG_WAIT_TIME review the log buffer size.
34. Log Switch Frequency
If excessive log switching then consider increasing the REDO log size. There should not me more than 1 log switch every 5 minutes.

4.8 Object

The “Object” part of the health check will inspect the following areas of the database:

35. Unusable Indexes
Unusable indexes are flagged as invalid and will not be picked by the optimizer resulting in sub-optimal execution plans. Unusable indexes should be rebuilt.
36. Invalid Objects
Invalid objects result in SQL execution failure and also contribute to failed parsing. Rebuild\recompile all invalid objects.
37. Disabled Constraints
Disabled constraints may allow data inserts to jeopardize data integrity. Enabled all constraints to enforce data integrity rules.
38. Non SYS/SYSTEM Objects In System
Storing non SYS/SYSTEM objects in system tablespaces can impact 'data dictionary operations and overall performance. Make sure all non SYS/SYSTEM objects are not stored in system tablespaces.
39. Missing Statistics
Missing statistics can severely impact system performance. Collect statistics appropriately to ensure the optimizer has the best information available to choose the optimum execution plan.
40. Stale Statistics
Stale statistics can severely impact system performance. Collect statistics appropriately to ensure the optimizer has the best available to choose the optimum execution plan.
41. Chained Rows
Chained rows can impact I/O operation performance. Consider reviewing block PCT settings and block size to reduce chained rows.
42. Orphaned Synonyms
Orphaned synonyms result in SQL execution failure and also contribute to failed parsing. Correct or remove orphaned synonyms.
43. Not Null Columns
Columns should be declared as NOT NULL where possible. This causes different behavior with the optimizer resulting in more optimum plans.
44. Locked Statistics
Locked statistics will force the optimizer to choose the same execution plan. Only use locked statistics if you are 100% sure the current execution plans are appropriate.
45. Locked Objects
Locked objects may prevent other operations. Whilst locking is normal careful  review should be taken on objects which are locked for extended periods of time.
46. Numbers With No Precision
Large floating point values stored in NUMBER datatypes with no precision can 'consume more storage than normal and possibly impact performance.

4.9 Capacity

The ninth phase of the health check will review performance information related to disk activity.

47. Tablespace Free
To avoid tablespace depletion consider adding more space to tablespaces that have less than 15% free space.
48. Extent Free
To avoid space depletion consider adding more extents to objects that have less than 15% free space.
49. Orphaned Tablespaces
Tablespaces with no default user may represent orphaned data and unnecessary space consumption
50. Database File Count
Insufficient slots for new datafiles can result in problems with database file creation. Increase the DB_FILES parameter if the number of existing files is approaching the DB_FILES value.
51. Active Restore Points
Active restore points retain archive log files which has the potential to fill up archive log space and freeze the database. Active restore points should be removed as soon as possible.
52. TEMP Usage
Consider adding more space to the TEMP tablespace if there is any evidence of more than 80% consumption. Alternatively identify and tune queries responsible for consuming excessive TEMP space.
53. UNDO Usage
Consider adding more space to the UNDO tablespace if there is any evidence of more than 80% consumption. Alternatively identify and tune queries responsible for consuming excessive UNDO space.

4.10 Hit Ratios

The “Hit Ratios” part of the health check will inspect the following areas of the database:

54. Data Dictionary
The data dictionary cache hit ratio should be > 90% and misses should be < 15%. If not within these thresholds consider increasing the SHARED_POOL_SIZE.
55. Library Cache(Summary)
The library cache hit ratio should be > 90% and misses should be < 1%. If not within these thresholds consider increasing the SHARED_POOL_SIZE.
56. Library Cache(Detail)
GETHITRATIO and PINHITRATIO should be more than 90%.
57. Shared Pool
%FREE should be > 0.5. REQUEST_FAILURES,REQUEST_MISSES should be 0 or near 0. If not within these thresholds consider increasing SHARED_POOL_RESERVED_SIZE and SHARED_POOL_SIZE.
58. Buffer Cache
The buffer cache hit ratio should be >90%. If not within this threshold consider increasing DB_CACHE_SIZE.

4.11 Memory

The “Memory” part of the health check will inspect the following areas of the database:

59. PGA Effectiveness
If Onepass or Multipass > 0 then increase PGA_AGG_TARGET. 'If excessive optimal executions then decrease PGA_AGG_TARGET.
60. Disk Sorts
Try to avoid sorts to disk. Identify top query sorts through the Speedway SQL topology and tune accordingly. 
61. Top Latches
Excessive latching can cause unnecessary CPU consumption and performance degradation. Look at tuning the top 5 latch events
62. Memory Resize Operations
Excessive memory resizing has an additional overhead. Avoid excessive memory resizing operations by maintaining tighter controls over SGA_MAX and SGA_TARGET parameters.
63. Result Cache Stats
Consider only including data sets in the results cache that are not subject to a lot of data manipulation.
64. PGA Consumption
Consider tuning specific SQL which are consuming the most PGA. PGA consumption is a result of join operations and sorting during a query.
65. Memory Advisers
The memory advisers should be reviewed to determine the benefit of resizing a certain memory region. Look at the size factor and estimated benefit% to weigh up the benefit.

4.12 File System

The “File System” part of the health check will inspect the following areas of the database:

66. CTRL File Multiplexing
Ensure control files are multiplexed.
67. CTRL File Retention
Ensure sufficient control file retention.
68. REDO LOG Multiplexing
Ensure REDO log files are multiplexed.
69. UNDO Activity
In the event of excessive UNDO time investigate the following areas:
DATA BLOCK: Check for SQL statements using unselective indexes.
UNDO HEADER: Consider using automatic segment-space management or add more rollback segments.
UNDO BLOCK: Consider using automatic segment-space management or make rollback segment sizes larger.
SEGMENT HEDER: Look for the offending segment and consider increasing free-lists.
70. ASM Diskgroup Statistics
Investigate storage layer performance issues if average speeds > 10ms.
71. Files in Backup Mode
Any datafile left in backup mode will not be modified. Instead changes are logged to the REDO stream. Make sure no datafiles are inadvertently left in 'backup mode.
72. Media Recovery
Any file in need of media recovery is inaccessible for general use. The file will will need to set offline or recovered.
73. Dataguard Gap Analysis
Dataguard is responsible for shipping transaction logs between the primary and standby database. Investigate the reason for any transaction gaps between the primary and standby database.

4.13 File Parameters

The “Parameters” part of the health check will inspect the following areas of the database:

74. Parameters Changed By User Since Startup
List of all parameter changes made by users since database startup. These can be system wide  or isolated to session level. Ensure parameter changes are conducted with the correct intent and are not causing a negative impact.  
75. Parameters Changed By System Since Startup
List of all parameter changes made by the system since database startup. Investigate why the database is automatically changing these parameters to avoid and negative impact.
76. Deprecated parameters
Deprecated parameters are no longer supported but are still provided for backward compatibility. Try and avoid the use of deprecated parameters.
77. Parameter Check
Certain parameters can have a big impact on database performance and integrity. This section raises awareness around important parameters.

Complete Database Performance Management