• Pavol Drotován
  • DB expert
  • 08-05-2024

Computing statistics - Oracle

Basic elements of optimization within Oracle, but also among other database systems, include regular recalculation of statistics on database objects. Statistics are an integral part of the work of database administration, because if table objects in which DML operations are performed were not performed, the primary keys and indexes would not be up to date and the DB system would propose an incorrect execution plan, which could result in a radical slowdown SQL queries and thereby slowing down the performance of the database.

Due to the importance of statistics, some database systems have a certain recalculation of statistics set by default on a daily or weekly basis when the smallest database load is expected, but this cannot be said about Oracle, which must explicitly define the recalculation of statistics. Therefore, incorrect recalculation settings by database administrators (hereinafter DBAs) can mean the collapse of database systems and their connected applications in the long term, so it is important to pay attention to this. (For example, postgres has already solved this with daily automatic vacuum jobs and current data.)

What causes problems

Most of the time, uncalculated statistics are caused by the failure to complete statistics calculations in an already explicitly defined mostly weekend job or by an unplanned restart of the databases, when the database system has nothing to create an optimized execution plan. Usually the problem is that large tables and their partitions require more frequent recalculation of statistics and explicitly setting the histogram for the exact percentage of statistics recalculation in table objects.

Basic control

  1. We will find out if we already have a job for recalculating statistics
  2. 
    --under sys account
     
    SELECT job_name, enabled
     
    FROM dba_scheduler_jobs
     
    WHERE job_name = 'GATHER_STATS_JOB'
  3. ak ano, pozrieme či beží správne, ak nie alebo ho nemáme nastavený, nastavíme ho na dennej báze alebo počas maitenance window(toto sa odporúča)
    • DENNE
    • 
      BEGIN 
      DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB', 
       
      job_type => 'PLSQL_BLOCK', 
       
      job_action => 'BEGIN DBMS_STATS.GATHER_DATABASE_STATS; END;'
       
      , start_date => SYSTIMESTAMP, 
       
      repeat_interval => 'FREQ=DAILY; BYHOUR=2;', enabled => TRUE ); 
       
      END;
    • MAITENANCE
    • 
      BEGIN 
      
       DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB', 
      
       job_name        => 'GATHER_STATS_JOB',
        
         job_type        => 'PLSQL_BLOCK',
         
        job_action      => 'BEGIN DBMS_STATS.GATHER_DATABASE_STATS; END;',
         
        schedule_name   => 'MAINTENANCE_WINDOW',
         
        enabled         => TRUE)
      END;
  4. If we have set it up, we check the processing for the first week and if it runs correctly without errors, we have the statistics for some time and from time to time we check the process time

SELECT job_name, status, actual_start_date, run_duration

 FROM dba_scheduler_job_run_details

 WHERE job_name = 'GATHER_STATS_JOB'

We can decide to create a job only for some schemes, tables or create a customized procedure. For schemas it looks like this:


BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',

job_type => 'PLSQL_BLOCK',

job_action => 'BEGIN DBMS_STATS.GATHER_SCHEMA_STATS('SchemaName'); END;'

, start_date => SYSTIMESTAMP,

repeat_interval => 'FREQ=DAILY; BYHOUR=2;', enabled => TRUE );

END;
For tables

BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',

job_type => 'PLSQL_BLOCK',

job_action => 'BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','SALES', cascade=>true);DBMS_STATS.GATHER_TABLE_STATS('SH','SALES', cascade=>true);DBMS_STATS.GATHER_TABLE_STATS(' SH','SALES', cascade=>true); END;'
 
, start_date => SYSTIMESTAMP, 
 
repeat_interval => 'FREQ=DAILY; BYHOUR=2;', enabled => TRUE ); 
 
END;
Customizovana procedúra

BEGIN 
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB', 
job_type => 'PLSQL_BLOCK', 
job_action => '
declare
    p_est_percent INTEGER;
    p_count       integer;
    p_table_name  varchar(30);
    p_check_queue integer;
BEGIN
    for curr in (
        SELECT M.Table_name,
               T.Num_Rows,
               T.Last_Analyzed,
               M.Inserts,
               M.Deletes,
               M.Updates,
               CASE
                   WHEN NVL(T.Num_Rows, 0) = 0 THEN 100
                   ELSE 100 * (M.Inserts + M.Deletes) / T.Num_Rows END estimate_percent
        FROM dba_Tab_Modifications M
                 JOIN dba_Tables T ON T.Table_name = M.Table_name
                 LEFT OUTER JOIN dba_Queue_Tables Q ON Q.Queue_table = T.Table_Name
        WHERE Q.Queue_Table IS NULL          -- Never Queue tables
          AND TO_char(trunc(m.TIMESTAMP),'YYYY-MM-DD') = TO_char(sysdate, 'YYYY-MM-DD') --only current date
         AND (
            (
                (
                    T.Last_Analyzed IS NULL -- Always when all tables requested and statistics were not yet gathered
                        OR
                    (
                        M.Inserts + M.Deletes> 0
                            AND 
                        CASE
                            WHEN (M.Inserts + M.Deletes) > T.Num_Rows THEN
                        t.sample_size  / (M.Inserts + M.Deletes)
                    ELSE
                        (M.Inserts + M.Deletes) / T.Num_Rows
                 END >=
                        5
                        )
                    )
                )
            )

          and t.owner = 'SCHEMA_NAME'
        ORDER BY M.Table_Name)
        loop

          --set 1 if table is queued and skip step
            begin
                select 1 into p_check_queue from dba_Queue_Tables Q where queue_table = curr.Table_name;
            exception
                when others then null;
            end;


            continue when p_check_queue = 1;

            begin
                p_est_percent := case when curr.estimate_percent > 100 then 100 else curr.estimate_percent end;zh


                if curr.partition_name is null then
                    DBMS_STATS.GATHER_TABLE_STATS('DW_DATA', curr.table_name,
                                                  estimate_percent => p_est_percent, cascade => true);
                    p_table_name := curr.table_name;
                elsif curr.partition_name is not null and p_table_name <> curr.table_name then
                    DBMS_STATS.GATHER_TABLE_STATS('DW_DATA', curr.table_name, partname => curr.partition_name,
                                                  estimate_percent => p_est_percent,  cascade => false);
                end if;
            exception
                when others then
                    continue;
            end;
        end loop;
end', start_date => SYSTIMESTAMP, repeat_interval => 'FREQ=DAILY, BYHOUR=2', enabled => TRUE ); 
END;

If we don't want jobs, we track the statistics manually

select t.table_name,num_rows,inserts,deletes,updates,last_analyzed,timestamp from dba_tables t
join dba_tab_modifications td on td.table_name=t.table_name
where owner='SH'
  • For the entire table including all partitions and indexes Begin DBMS_STATS.GATHER_TABLE_STATS('SH','SALES', cascade=>true); end;
  • For partitions if we have separately Begin DBMS_STATS.GATHER_TABLE_STATS('SH','SALES','SALES_Q1_1998',cascade=>false); end; Begin DBMS_STATS.GATHER_TABLE_STATS('SH','SALES','SALES_Q3_1998',cascade=>false); end;
  • For only percentage of records changed (definitely recommended here) Percentage calculation select t.table_name,partition_name,round((updates + deletes + inserts) / num_rows * 100,2) from dba_tables t join dba_tab_modifications td on td.table_name=t.table_name where owner='SH' and table_anme='SALES' BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','SALES', estimate_percent => 5,cascade=>true); END;
  • Or we will use the procedure above to recalculate all tables

  • Summary

    Statistics are an integral part of databases and their recalculation should not be underestimated in terms of performance. It is best to set up the calculations in regular maintenance jobs and follow their execution consistently. Of course, it depends on the size of the tables. If the daily increment is larger, it is ideal to set daily jobs or incorporate statistics into processes, e.g. DWH process or other more complex packages. It is good to recalculate the statistics manually from time to time or when one of the set jobs fails.