From time to time, Oracle seems to prefer a MERGE JOIN CARTESIAN
operation over a regular MERGE JOIN
. Knowing the data and looking at concrete execution plans, I can see that this operation is usually not a problem, as one of the joined entities can return only exactly one record in the query at hand.
However, for historic reasons, our DBAs have a general distaste for cartesian products.
So I'd like to better analyse those cases and be backed up with documentation in my argumentation. Is there any official Oracle documentation about query transformation and the CBO where I can understand the cases when Oracle prefers the MERGE JOIN CARTESIAN
(or similar) operation?
In this case, I'm Using Oracle 11g (11.2.0.2.0)
UPDATE:
These are similar questions, but they don't explain why or when Oracle prefers the MJC
over a regular MERGE JOIN
:
Yes, the mention of cartesian joins usually makes a DBA's heart skip a beat. The cartesian joins caused by missing join conditions are definitely a pain to deal with - these are the types of joins that can "blow up" temp space and cause all types of alarms to go off.
I didn't find anything in Oracle's official 11g documentation on this particular join method but I did find plenty of articles about issues with it in their support DB. I have chased down a few of these in the past couple of weeks and here is what I found.
The source of the MJC is a CBO optimization. MJC is an optimization that works great when the cardinality of the result set being joined is low. The problem occurs when the Optimizer is not correctly estimating the cardinality of one or more of the result sets that are inputs to the join. If the estimated rows = 1 (or is a low number) but the actual rows for the result set is large then the optimizer may still choose a MJC resulting in a sub optimal plan. And that's an understatement. I have had issues with this happening and queries running for days and not finishing. After getting the CBO back on track they have run in seconds instead of hours or days.
The best way to find out if this Estimated Rows vs Actual Rows is the case is to run the query and view its execution plan statistics. You mentioned you are on 11g - use the SQL Monitoring feature. The output of this feature will show you how much time was spent on each step of your execution plan. It will also show you Estimated Rows vs Actual rows. You are looking for large discrepancies in Estimated Rows vs Actual Rows on the inputs for the MJC.
SQL Monitoring is available through OEM/DB Control, or you can use the API (search for DBMS_SQLTUNE.REPORT_SQL_MONITOR). The same sorts of info can be gathered using the GATHER_PLAN_STATISTICS hint with the query and then generating a report with DBMS_XPLAN... details are here to do that.
So how to get rid of it? Try to resolve object statistics issues. Once the CBO knows it is really dealing with hundreds, thousands, or millions of records as inputs to the join instead of '1' it should choose a join method more appropriate for the data set and not choose MJC. Easier said than done, books have been written on this topic but at least check out the basics - make sure all tables involved in the query at least have statistics. It may be possible to leverage Supplemental Statistics as well if you have multi-column expressions being applied in your where clause.
If you need a big hammer there are some hidden parameters that allow/disallow use of MJC. They can be implemented at a database level, at a session level, or query level (using hints). I'll leave the parameters names out as an exercise for the reader as Oracle's official stance is they should only be used under the direction of Support. Don't tell them but I have had some success eliminating MJC on the query level with an OPT_PARAM hint after attempts to get object statistics to cooperate failed.