Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 773 Vote(s) - 3.43 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Join vs. sub-query

#11
Subqueries have ability to calculate aggregation functions on a fly.
E.g. Find minimal price of the book and get all books which are sold with this price.
1) Using Subqueries:

SELECT titles, price
FROM Books, Orders
WHERE price =
(SELECT MIN(price)
FROM Orders) AND (Books.ID=Orders.ID);
2) using JOINs

SELECT MIN(price)
FROM Orders;
-----------------
2.99

SELECT titles, price
FROM Books b
INNER JOIN Orders o
ON b.ID = o.ID
WHERE o.price = 2.99;
Reply

#12
I think what has been under-emphasized in the cited answers is the issue of **duplicates** and problematic results that may arise from specific (use) cases.

(although Marcelo Cantos does mention it)

I will cite the example from Stanford's Lagunita courses on SQL.

###Student Table

+------+--------+------+--------+
| sID | sName | GPA | sizeHS |
+------+--------+------+--------+
| 123 | Amy | 3.9 | 1000 |
| 234 | Bob | 3.6 | 1500 |
| 345 | Craig | 3.5 | 500 |
| 456 | Doris | 3.9 | 1000 |
| 567 | Edward | 2.9 | 2000 |
| 678 | Fay | 3.8 | 200 |
| 789 | Gary | 3.4 | 800 |
| 987 | Helen | 3.7 | 800 |
| 876 | Irene | 3.9 | 400 |
| 765 | Jay | 2.9 | 1500 |
| 654 | Amy | 3.9 | 1000 |
| 543 | Craig | 3.4 | 2000 |
+------+--------+------+--------+

###Apply Table

(applications made to specific universities and majors)

+------+----------+----------------+----------+
| sID | cName | major | decision |
+------+----------+----------------+----------+
| 123 | Stanford | CS | Y |
| 123 | Stanford | EE | N |
| 123 | Berkeley | CS | Y |
| 123 | Cornell | EE | Y |
| 234 | Berkeley | biology | N |
| 345 | MIT | bioengineering | Y |
| 345 | Cornell | bioengineering | N |
| 345 | Cornell | CS | Y |
| 345 | Cornell | EE | N |
| 678 | Stanford | history | Y |
| 987 | Stanford | CS | Y |
| 987 | Berkeley | CS | Y |
| 876 | Stanford | CS | N |
| 876 | MIT | biology | Y |
| 876 | MIT | marine biology | N |
| 765 | Stanford | history | Y |
| 765 | Cornell | history | N |
| 765 | Cornell | psychology | Y |
| 543 | MIT | CS | N |
+------+----------+----------------+----------+

Let's try to find the GPA scores for students that have applied to `CS` major (regardless of the university)

***Using a subquery:***

select GPA from Student where sID in (select sID from Apply where major = 'CS');

+------+
| GPA |
+------+
| 3.9 |
| 3.5 |
| 3.7 |
| 3.9 |
| 3.4 |
+------+

The average value for this resultset is:

select avg(GPA) from Student where sID in (select sID from Apply where major = 'CS');

+--------------------+
| avg(GPA) |
+--------------------+
| 3.6800000000000006 |
+--------------------+

***Using a join:***

select GPA from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

+------+
| GPA |
+------+
| 3.9 |
| 3.9 |
| 3.5 |
| 3.7 |
| 3.7 |
| 3.9 |
| 3.4 |
+------+

average value for this resultset:

select avg(GPA) from Student, Apply where Student.sID = Apply.sID and Apply.major = 'CS';

+-------------------+
| avg(GPA) |
+-------------------+
| 3.714285714285714 |
+-------------------+


It is obvious that the second attempt yields misleading results in our use case, given that it counts duplicates for the computation of the average value.
It is also evident that usage of `distinct` with the join - based statement will **not** eliminate the problem, given that it will erroneously keep one out of three occurrences of the `3.9` score. The correct case is to account for **TWO (2)** occurrences of the `3.9` score given that we actually have **TWO (2)** students with that score that comply with our query criteria.

It seems that in some cases a sub-query is the safest way to go, besides any performance issues.

Reply

#13
I just thinking about the same problem, but I am using subquery in the FROM part.
I need connect and query from large tables, the "slave" table have 28 million record but the result is only 128 so small result big data! I am using MAX() function on it.

First I am using LEFT JOIN because I think that is the correct way, the mysql can optimalize etc.
Second time just for testing, I rewrite to sub-select against the JOIN.

LEFT JOIN runtime: 1.12s
SUB-SELECT runtime: 0.06s

18 times faster the subselect than the join! Just in the chokito adv. The subselect looks terrible but the result ...
Reply

#14
- A general rule is that **joins** are faster in most cases (99%).
- The more data tables have, the **subqueries** are slower.
- The less data tables have, the **subqueries** have equivalent speed as **joins**.
- The **subqueries** are simpler, easier to understand, and easier to read.
- Most of the web and app frameworks and their "ORM"s and "Active record"s generate queries with **subqueries**, because with **subqueries** are easier to split responsibility, maintain code, etc.
- For smaller web sites or apps **subqueries** are OK, but for larger web sites and apps you will often have to rewrite generated queries to **join** queries, especial if a query uses many **subqueries** in the query.

Some people say "some RDBMS can rewrite a **subquery** to a **join** or a **join** to a **subquery** when it thinks one is faster than the other.", but this statement applies to simple cases, surely not for complicated queries with **subqueries** which actually cause a problems in performance.
Reply

#15
If you want to speed up your query using join:

For "inner join/join",
Don't use where condition instead use it in "ON" condition.
Eg:

select id,name from table1 a
join table2 b on a.name=b.name
where id='123'

Try,

select id,name from table1 a
join table2 b on a.name=b.name and a.id='123'
For "Left/Right Join",
Don't use in "ON" condition, Because if you use left/right join it will get all rows for any one table.So, No use of using it in "On". So, Try to use "Where" condition
Reply

#16
**Taken from the MySQL manual** ([13.2.10.11 Rewriting Subqueries as Joins][1]):

> A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than `LEFT [OUTER] JOIN`, but in my opinion their strength is slightly higher readability.

[1]:

[To see links please register here]

Reply

#17
As per my observation like two cases, if a table has less then 100,000 records then the join will work fast.

But in the case that a table has more than 100,000 records then a subquery is best result.

I have one table that has 500,000 records on that I created below query and its result time is like

SELECT *
FROM crv.workorder_details wd
inner join crv.workorder wr on wr.workorder_id = wd.workorder_id;

> Result : 13.3 Seconds

select *
from crv.workorder_details
where workorder_id in (select workorder_id from crv.workorder)

> Result : 1.65 Seconds
Reply

#18
It depends on several factors, including the specific query you're running, the amount of data in your database. Subquery runs the internal queries first and then from the result set again filter out the actual results. Whereas in join runs the and produces the result in one go.

The best strategy is that you should test both the join solution and the subquery solution to get the optimized solution.
Reply

#19
Subqueries are generally used to return a single row as an atomic value, though they may be used to compare values against multiple rows with the IN keyword. They are allowed at nearly any meaningful point in a SQL statement, including the target list, the WHERE clause, and so on. A simple sub-query could be used as a search condition. For example, between a pair of tables:


SELECT title
FROM books
WHERE author_id = (
SELECT id
FROM authors
WHERE last_name = 'Bar' AND first_name = 'Foo'
);

Note that using a normal value operator on the results of a sub-query requires that only one field must be returned. If you're interested in checking for the existence of a single value within a set of other values, use IN:

SELECT title
FROM books
WHERE author_id IN (
SELECT id FROM authors WHERE last_name ~ '^[A-E]'
);

This is obviously different from say a LEFT-JOIN where you just want to join stuff from table A and B even if the join-condition doesn't find any matching record in table B, etc.

If you're just worried about speed you'll have to check with your database and write a good query and see if there's any significant difference in performance.
Reply

#20
In the year 2010 I would have joined the author of this questions and would have strongly voted for `JOIN`, but with much more experience (especially in MySQL) I can state: Yes subqueries can be better. I've read multiple answers here; some stated subqueries are faster, but it lacked a good explanation. I hope I can provide one with this (very) late answer:

First of all, let me say the most important: **There are different forms of sub-queries**

And the second important statement: **Size matters**

If you use sub-queries, you should **be aware** of how the DB-Server executes the sub-query. Especially **if the sub-query is evaluated once or for every row!**
On the other side, a modern DB-Server is able to optimize a lot. In some cases a subquery helps optimizing a query, but a newer version of the DB-Server might make the optimization obsolete.

## Sub-queries in Select-Fields ##

SELECT moo, (SELECT roger FROM wilco WHERE moo = me) AS bar FROM foo

Be aware that a sub-query is executed for every resulting row from `foo`.
Avoid this if possible; it may drastically slow down your query on huge datasets. However, if the sub-query has no reference to `foo` it can be optimized by the DB-server as static content and could be evaluated only once.

## Sub-queries in the Where-statement ##

SELECT moo FROM foo WHERE bar = (SELECT roger FROM wilco WHERE moo = me)

If you are lucky, the DB optimizes this internally into a `JOIN`. If not, your query will become very, very slow on huge datasets because it will execute the sub-query for every row in `foo`, not just the results like in the select-type.

## Sub-queries in the Join-statement

SELECT moo, bar
FROM foo
LEFT JOIN (
SELECT MIN(bar), me FROM wilco GROUP BY me
) ON moo = me

This is interesting. We combine `JOIN` with a sub-query. And here we get the real strength of sub-queries. Imagine a dataset with millions of rows in `wilco` but only a few distinct `me`. Instead of joining against a huge table, we have now a smaller temporary table to join against. This can result in much faster queries depending on database size. You can have the same effect with `CREATE TEMPORARY TABLE ...` and `INSERT INTO ... SELECT ...`, which might provide better readability on very complex queries (but can lock datasets in a repeatable read isolation level).

## Nested sub-queries ##

SELECT VARIANCE(moo)
FROM (
SELECT moo, CONCAT(roger, wilco) AS bar
FROM foo
HAVING bar LIKE 'SpaceQ%'
) AS temp_foo
GROUP BY moo

You can nest sub-queries in multiple levels. This can help on huge datasets if you have to group or change the results. Usually the DB-Server creates a temporary table for this, but sometimes you do not need some operations on the whole table, only on the resultset. This might provide a much better performance depending on the size of the table.

## Conclusion ##

Sub-queries are no replacement for a `JOIN` and you should not use them like this (although possible). In my humble opinion, the correct use of a sub-query is the use as a quick replacement of `CREATE TEMPORARY TABLE ...`. A good sub-query reduces a dataset in a way you cannot accomplish in an `ON` statement of a `JOIN`. If a sub-query has one of the keywords `GROUP BY` or `DISTINCT` and is preferably not situated in the select fields or the where statement, then it might improve performance a lot.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through