Hi all,
Just to complete my remark, after reading the document 'Performance and Tuning Series: Improving Performance with Statistical Analysis' (doc ID DC00976-01-1570-01) I notice the following information on page 29:
Statistics for column group: "pub_id", "type"
Last update of column statistics: Apr 8 2008 9:22:40:963AM
Range cell density: 0.0000000362887320
Total density: 0.0000000362887320
Range selectivity: default used (0.33)
In between selectivity: default used (0.25)
Unique range values: 0.0000000160149449
Unique total values: 0.0000000160149449
Average column width: default used (8.00)
Statistics for column group: "pub_id", "type", "pubdate"
Last update of column statistics: Apr 8 2008 9:22:40:963AM
Range cell density: 0.0000000358937986
Total density: 0.0000000358937986
Range selectivity: default used (0.33)
In between selectivity: default used (0.25)
Unique range values: 0.0000000158004305
Unique total values: 0.0000000158004305
Average column width: 2.0000000000000000
With 5000 rows in the table, the increasing precision of the optimizer’s
estimates of rows to be returned depends on the number of search arguments
used in the query:
• An equality search argument on only pub_id results in the estimate that
0.0335391029690461 * 5000 rows, or 168 rows, will be returned.
• Equality search arguments for all three columns result in the estimate that
0.0002011791956201 * 5000 rows, or only 1 row, will be returned.
I'm wondering where both values 0.0335391029690461 and 0.0002011791956201 come from?
let's assume the optimizer knows the number of rows to be returned (statistics up-to-date), how can I check using a index is cheaper than a tablescan? what's the formulae I could use to compute the costs for each access as an exercise to better understand the mechanism.
Last, are there any traceflags that will help me understanding how the optimized is picking a plan?
Thanks
Simon