Posts: 0
Threads: 0
Joined: Nov 2020
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
I have this table for documents (simplified version here):
| id | rev | content |
|------|-------|--------------------------------------|
| 1 | 1 | ... |
| 2 | 1 | ... |
| 1 | 2 | ... |
| 1 | 3 | ... |
How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: `[1, 3, ...]` and `[2, 1, ..]`. I'm using ***MySQL***.
Currently I use checks in the `while` loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a **SQL** solution?
|
Posts: 0
Threads: 0
Joined: Jun 2017
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
I can't vouch for the performance, but here's a trick inspired by the limitations of Microsoft Excel. It has some good features
**GOOD STUFF**
- It should force return of only one "max record" even if there is a tie (sometimes useful)
- It doesn't require a join
**APPROACH**
It is a little bit ugly and requires that you know something about the range of valid values of the ***rev*** column. Let us assume that we know the ***rev*** column is a number between 0.00 and 999 including decimals but that there will only ever be two digits to the right of the decimal point (e.g. 34.17 would be a valid value).
The gist of the thing is that you create a single synthetic column by string concatenating/packing the primary comparison field along with the data you want. In this way, you can force SQL's MAX() aggregate function to return all of the data (because it has been packed into a single column). Then you have to unpack the data.
Here's how it looks with the above example, written in SQL
SELECT id,
CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev
FROM (SELECT id,
CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
FROM yourtable
)
GROUP BY id
The packing begins by forcing the ***rev*** column to be a number of known character length regardless of the value of ***rev*** so that for example
- 3.2 becomes 1003.201
- 57 becomes 1057.001
- 923.88 becomes 1923.881
If you do it right, string comparison of two numbers should yield the same "max" as numeric comparison of the two numbers and it's easy to convert back to the original number using the substring function (which is available in one form or another pretty much everywhere).
|
Posts: 0
Threads: 0
Joined: Nov 2018
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
My preference is to use as little code as possible...
You can do it using `IN`
try this:
SELECT *
FROM t1 WHERE (id,rev) IN
( SELECT id, MAX(rev)
FROM t1
GROUP BY id
)
to my mind it is less complicated... easier to read and maintain.
|
Posts: 0
Threads: 0
Joined: Dec 2019
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
Yet another solution is to use a correlated subquery:
select yt.id, yt.rev, yt.contents
from YourTable yt
where rev =
(select max(rev) from YourTable st where yt.id=st.id)
Having an index on (id,rev) renders the subquery almost as a simple lookup...
Following are comparisons to the solutions in @AdrianCarneiro's answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3.
While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch (`id in (1,2,3)`), subquery is much slower then the others (Due to rerunning the subquery). However I couldnt differentiate between leftjoin and correlated solutions in speed.
One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups...
|
Posts: 0
Threads: 0
Joined: Aug 2021
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
This solution makes only one selection from YourTable, therefore it's faster. It works only for MySQL and SQLite(for SQLite remove DESC) according to test on sqlfiddle.com. Maybe it can be tweaked to work on other languages which I am not familiar with.
SELECT *
FROM ( SELECT *
FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
UNION
SELECT 2, 1, 'content2'
UNION
SELECT 1, 2, 'content3'
UNION
SELECT 1, 3, 'content4'
) as YourTable
ORDER BY id, rev DESC
) as YourTable
GROUP BY id
|
Posts: 0
Threads: 0
Joined: Jun 2020
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
Here is a nice way of doing that
Use following code :
with temp as (
select count(field1) as summ , field1
from table_name
group by field1 )
select * from temp where summ = (select max(summ) from temp)
|
Posts: 0
Threads: 0
Joined: Sep 2021
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
I would use this:
select t.*
from test as t
join
(select max(rev) as rev
from test
group by id) as o
on o.rev = t.rev
Subquery SELECT is not too eficient maybe, but in JOIN clause seems to be usable. I'm not an expert in optimizing queries, but I've tried at MySQL, PostgreSQL, FireBird and it does work very good.
You can use this schema in multiple joins and with WHERE clause. It is my working example (solving identical to yours problem with table "firmy"):
select *
from platnosci as p
join firmy as f
on p.id_rel_firmy = f.id_rel
join (select max(id_obj) as id_obj
from firmy
group by id_rel) as o
on o.id_obj = f.id_obj and p.od > '2014-03-01'
It is asked on tables having teens thusands of records, and it takes less then 0,01 second on really not too strong machine.
I wouldn't use IN clause (as it is mentioned somewhere above). IN is given to use with short lists of constans, and not as to be the query filter built on subquery. It is because subquery in IN is performed for every scanned record which can made query taking very loooong time.
|
Posts: 0
Threads: 0
Joined: Jun 2018
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
I like to do this by ranking the records by some column. In this case, rank `rev` values grouped by `id`. Those with higher `rev` will have lower rankings. So highest `rev` will have ranking of 1.
select id, rev, content
from
(select
@rowNum := if(@prevValue = id, @rowNum+1, 1) as row_num,
id, rev, content,
@prevValue := id
from
(select id, rev, content from YOURTABLE order by id asc, rev desc) TEMP,
(select @rowNum := 1 from DUAL) X,
(select @prevValue := -1 from DUAL) Y) TEMP
where row_num = 1;
Not sure if introducing variables makes the whole thing slower. But at least I'm not querying `YOURTABLE` twice.
|
Posts: 0
Threads: 0
Joined: Dec 2022
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
If you have many fields in select statement and you want latest value for all of those fields through optimized code:
select * from
(select * from table_name
order by id,rev desc) temp
group by id
|
Posts: 0
Threads: 0
Joined: Aug 2020
Reputation:
0
Level: inf []
Total Points: inf
Rank nan / 1
100% to upload Level
Activity inf / 1
99% to upload your Rank
Experience nan
100% to upload Experience
Points: 50
|
Sorted the rev field in reverse order and then grouped by id which gave the first row of each grouping which is the one with the highest rev value.
SELECT * FROM (SELECT * FROM table1 ORDER BY id, rev DESC) X GROUP BY X.id;
Tested in [To see links please register here] with the following data
CREATE TABLE table1
(`id` int, `rev` int, `content` varchar(11));
INSERT INTO table1
(`id`, `rev`, `content`)
VALUES
(1, 1, 'One-One'),
(1, 2, 'One-Two'),
(2, 1, 'Two-One'),
(2, 2, 'Two-Two'),
(3, 2, 'Three-Two'),
(3, 1, 'Three-One'),
(3, 3, 'Three-Three')
;
This gave the following result in MySql 5.5 and 5.6
id rev content
1 2 One-Two
2 2 Two-Two
3 3 Three-Two
|
|