Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 443 Vote(s) - 3.47 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Counting DISTINCT over multiple columns

#1
Is there a better way of doing a query like this:

SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)
Reply

#2
How about something like:

<pre>
select count(*)
from
(select count(*) cnt
from DocumentOutputItems
group by DocumentId, DocumentSessionId) t1

</pre>

Probably just does the same as you are already though but it avoids the DISTINCT.

Reply

#3
if you had only one field to "DISTINCT", you could use:

SELECT COUNT(DISTINCT DocumentId)
FROM DocumentOutputItems

and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:

SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId))
FROM DocumentOutputItems

but you'll have issues if NULLs are involved. I'd just stick with the original query.
Reply

#4
There's nothing wrong with your query, but you could also do it this way:

WITH internalQuery (Amount)
AS
(
SELECT (0)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
FROM internalQuery
Reply

#5
What is it about your existing query that you don't like? If you are concerned that `DISTINCT` across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /

DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN

9 rows selected.


SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /

COUNT(*)
----------
9

SQL>

**edit**

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
2 /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9

SQL>

**edit 2**

Given the following data the concatenating solution provided above will miscount:

col1 col2
---- ----
A AA
AA A

So we to include a separator...

select col1 + '*' + col2 from t23
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.
Reply

#6
I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.

At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.

Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using

SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))

but this is also not conforting.

You can use HASHBYTES() function as suggested in

[To see links please register here]

. However this also has a small chance of not returning unique results.

I would suggest using

SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems
Reply

#7
Hope this works i am writing on prima vista

SELECT COUNT(*)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
Reply

#8
I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)

SELECT COUNT(DISTINCT DocumentID) AS Count1,
COUNT(DISTINCT DocumentSessionId) AS Count2
FROM DocumentOutputItems
Reply

#9
**Edit: Altered from the less-than-reliable checksum-only query**
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems
Reply

#10
If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through