Wednesday, July 5, 2017

A full table scan is not always bad, an index is not always good

This is a well known fact: (almost) every developer knows that "sometimes, a full table scan can perform better than using an index".

Lets show this in a little demonstration, which I picked up from Tom Kyte's Expert Oracle Database Architecture. The purpose of the demo is to show that an index access can be more expensive than a full table scan, under some circumstances.

I'm running this demo on a Oracle XE 11gR2 instance, running on a Lenovo X1 Yoga notebook, running Windows 10 (64 bits), with an Intel Core i7-6600U CPU (2.60 GHz), and 16 GB RAM.


Setup


Let's first create a table to hold data with a column having values sorted:

CREATE TABLE ORDERED (X NUMBER, Y VARCHAR2(2000), PRIMARY KEY(X));

Now populate ORDERED, taking care of sorting the data in the column X from 1 to 1'000'000. 

begin
  for i in  1 .. 1000000 
  loop
    insert into ordered 
    values(i, rpad(dbms_random.random, 75, '*'));
  end loop;
end;
/
commit;


... then create a table UNORDERED with identical structure, and populate it with the data from the first table, putting the values in the column X in random order:

CREATE TABLE UNORDERED AS SELECT X, Y FROM ORDERED ORDER BY Y;
ALTER TABLE UNORDERED ADD CONSTRAINT PK_UNORDERED PRIMARY KEY (X);

And before starting with the demo, gather the statistics:
begin
  dbms_stats.gather_table_stats(USER, 'UNORDERED');
  dbms_stats.gather_table_stats(USER, 'ORDERED');
end;
/

Comparing the execution plans

1. Table with ordered data:

SQL Statement:
$EXPLAIN PLAN FOR SELECT * FROM ORDERED WHERE X BETWEEN 10000 AND 30000;

------------------------------------------------------------------------------------------
|Id   | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             | 20002 |  1582K|   323   (0)| 00:00:04 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ORDERED     | 20002 |  1582K|   323   (0)| 00:00:04 |
|*  2 |   INDEX RANGE SCAN          | SYS_C007666 | 20002 |       |    68   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

2. Table with unordered data, full table scan:

SQL Statement:
EXPLAIN PLAN FOR SELECT * FROM UNORDERED WHERE X BETWEEN 10000 AND 30000;

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 20002 |  1582K|  3275   (1)| 00:00:40 |
|*  1 |  TABLE ACCESS FULL| UNORDERED | 20002 |  1582K|  3275   (1)| 00:00:40 |
-------------------------------------------------------------------------------

3. Table with unordered data, using the index:

SQL Statement:
EXPLAIN PLAN FOR SELECT /*+ index( UNORDERED PK_UNORDERED ) */* 
FROM UNORDERED WHERE X BETWEEN 10000 AND 30000;

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              | 20002 |  1582K| 20049   (1)| 00:04:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| UNORDERED    | 20002 |  1582K| 20049   (1)| 00:04:01 |
|*  2 |   INDEX RANGE SCAN          | PK_UNORDERED | 20002 |       |    44   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------


Conclusion


Execution plans displays a much higher cost when the index is used on an unordered set of data - in that particular case. This might be unexpected, but it shows how important physical design is. So blindly putting indexes when encountering full table scans is not always a good idea. 

Trying to change the optimizer first choice might also not be a good idea. Oddly, there's often in room a developer tempted to prevent the optimizer to  to use a full table scan. Not good.

Finally, let's also point out that maintaining an index is also bound to costs.