Login

Is there a better way of doing a query like this:

SELECT COUNT(*)
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery

I need to count the number of distinct items from this table but the distinct is over two columns.

My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)

How about something like:

<pre>
select count(*)
from
(select count(*) cnt
from DocumentOutputItems
group by DocumentId, DocumentSessionId) t1

</pre>

Probably just does the same as you are already though but it avoids the DISTINCT.

if you had only one field to "DISTINCT", you could use:

SELECT COUNT(DISTINCT DocumentId)
FROM DocumentOutputItems

and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:

SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId))
FROM DocumentOutputItems

but you'll have issues if NULLs are involved. I'd just stick with the original query.

There's nothing wrong with your query, but you could also do it this way:

WITH internalQuery (Amount)
AS
(
SELECT (0)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
FROM internalQuery

What is it about your existing query that you don't like? If you are concerned that `DISTINCT` across two columns does not return just the unique permutations why not try it?

It certainly works as you might expect in Oracle.

SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /

DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN

9 rows selected.

SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /

COUNT(*)
----------
9

SQL>

**edit**

I went down a blind alley with analytics but the answer was depressingly obvious...

SQL> select count(distinct concat(deptno,job)) from emp
2 /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9

SQL>

**edit 2**

Given the following data the concatenating solution provided above will miscount:

col1 col2
---- ----
A AA
AA A

So we to include a separator...

select col1 + '*' + col2 from t23
/

Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.

I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.

At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.

Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using

SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))

but this is also not conforting.

You can use HASHBYTES() function as suggested in

[To see links please register here]

. However this also has a small chance of not returning unique results.

I would suggest using

SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems

Hope this works i am writing on prima vista

SELECT COUNT(*)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId

I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)

SELECT COUNT(DISTINCT DocumentID) AS Count1,
COUNT(DISTINCT DocumentSessionId) AS Count2
FROM DocumentOutputItems

**Edit: Altered from the less-than-reliable checksum-only query**
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable

SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems

If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.

Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.

I believe a distinct count of the computed column would be equivalent to your query.

forewords927450

fikrejmkypq

fluoaluminate205068

Mrunmemorializeds877

leishmania778003

annaqq

alabastrine903359

conversioning694523

enteradenographyw

yokedvtbktbnu