Basic elements of optimization within Oracle, but also among other database systems, include regular recalculation of statistics on database objects. Statistics are an integral part of the work of database administration, because if table objects in which DML operations are performed were not performed, the primary keys and indexes would not be up to date and the DB system would propose an incorrect execution plan, which could result in a radical slowdown SQL queries and thereby slowing down the performance of the database.
Due to the importance of statistics, some database systems have a certain recalculation of statistics set by default on a daily or weekly basis when the smallest database load is expected, but this cannot be said about Oracle, which must explicitly define the recalculation of statistics. Therefore, incorrect recalculation settings by database administrators (hereinafter DBAs) can mean the collapse of database systems and their connected applications in the long term, so it is important to pay attention to this. (For example, postgres has already solved this with daily automatic vacuum jobs and current data.)
Most of the time, uncalculated statistics are caused by the failure to complete statistics calculations in an already explicitly defined mostly weekend job or by an unplanned restart of the databases, when the database system has nothing to create an optimized execution plan. Usually the problem is that large tables and their partitions require more frequent recalculation of statistics and explicitly setting the histogram for the exact percentage of statistics recalculation in table objects.
--under sys account
SELECT job_name, enabled
FROM dba_scheduler_jobs
WHERE job_name = 'GATHER_STATS_JOB'
BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',
job_type => 'PLSQL_BLOCK',
job_action => 'BEGIN DBMS_STATS.GATHER_DATABASE_STATS; END;'
, start_date => SYSTIMESTAMP,
repeat_interval => 'FREQ=DAILY; BYHOUR=2;', enabled => TRUE );
END;
BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',
job_name => 'GATHER_STATS_JOB',
job_type => 'PLSQL_BLOCK',
job_action => 'BEGIN DBMS_STATS.GATHER_DATABASE_STATS; END;',
schedule_name => 'MAINTENANCE_WINDOW',
enabled => TRUE)
END;
SELECT job_name, status, actual_start_date, run_duration
FROM dba_scheduler_job_run_details
WHERE job_name = 'GATHER_STATS_JOB'
We can decide to create a job only for some schemes, tables or create a customized procedure. For schemas it looks like this:
BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',
job_type => 'PLSQL_BLOCK',
job_action => 'BEGIN DBMS_STATS.GATHER_SCHEMA_STATS('SchemaName'); END;'
, start_date => SYSTIMESTAMP,
repeat_interval => 'FREQ=DAILY; BYHOUR=2;', enabled => TRUE );
END;
For tables
BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',
job_type => 'PLSQL_BLOCK',
job_action => 'BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','SALES', cascade=>true);DBMS_STATS.GATHER_TABLE_STATS('SH','SALES', cascade=>true);DBMS_STATS.GATHER_TABLE_STATS(' SH','SALES', cascade=>true); END;'
, start_date => SYSTIMESTAMP,
repeat_interval => 'FREQ=DAILY; BYHOUR=2;', enabled => TRUE );
END;
Customizovana procedúra
BEGIN
DBMS_SCHEDULER.create_job ( job_name => 'GATHER_STATS_JOB',
job_type => 'PLSQL_BLOCK',
job_action => '
declare
p_est_percent INTEGER;
p_count integer;
p_table_name varchar(30);
p_check_queue integer;
BEGIN
for curr in (
SELECT M.Table_name,
T.Num_Rows,
T.Last_Analyzed,
M.Inserts,
M.Deletes,
M.Updates,
CASE
WHEN NVL(T.Num_Rows, 0) = 0 THEN 100
ELSE 100 * (M.Inserts + M.Deletes) / T.Num_Rows END estimate_percent
FROM dba_Tab_Modifications M
JOIN dba_Tables T ON T.Table_name = M.Table_name
LEFT OUTER JOIN dba_Queue_Tables Q ON Q.Queue_table = T.Table_Name
WHERE Q.Queue_Table IS NULL -- Never Queue tables
AND TO_char(trunc(m.TIMESTAMP),'YYYY-MM-DD') = TO_char(sysdate, 'YYYY-MM-DD') --only current date
AND (
(
(
T.Last_Analyzed IS NULL -- Always when all tables requested and statistics were not yet gathered
OR
(
M.Inserts + M.Deletes> 0
AND
CASE
WHEN (M.Inserts + M.Deletes) > T.Num_Rows THEN
t.sample_size / (M.Inserts + M.Deletes)
ELSE
(M.Inserts + M.Deletes) / T.Num_Rows
END >=
5
)
)
)
)
and t.owner = 'SCHEMA_NAME'
ORDER BY M.Table_Name)
loop
--set 1 if table is queued and skip step
begin
select 1 into p_check_queue from dba_Queue_Tables Q where queue_table = curr.Table_name;
exception
when others then null;
end;
continue when p_check_queue = 1;
begin
p_est_percent := case when curr.estimate_percent > 100 then 100 else curr.estimate_percent end;zh
if curr.partition_name is null then
DBMS_STATS.GATHER_TABLE_STATS('DW_DATA', curr.table_name,
estimate_percent => p_est_percent, cascade => true);
p_table_name := curr.table_name;
elsif curr.partition_name is not null and p_table_name <> curr.table_name then
DBMS_STATS.GATHER_TABLE_STATS('DW_DATA', curr.table_name, partname => curr.partition_name,
estimate_percent => p_est_percent, cascade => false);
end if;
exception
when others then
continue;
end;
end loop;
end', start_date => SYSTIMESTAMP, repeat_interval => 'FREQ=DAILY, BYHOUR=2', enabled => TRUE );
END;
select t.table_name,num_rows,inserts,deletes,updates,last_analyzed,timestamp from dba_tables t
join dba_tab_modifications td on td.table_name=t.table_name
where owner='SH'
Statistics are an integral part of databases and their recalculation should not be underestimated in terms of performance. It is best to set up the calculations in regular maintenance jobs and follow their execution consistently. Of course, it depends on the size of the tables. If the daily increment is larger, it is ideal to set daily jobs or incorporate statistics into processes, e.g. DWH process or other more complex packages. It is good to recalculate the statistics manually from time to time or when one of the set jobs fails.