0

I want to query a table for where the number of identical values in one column is equal to the number distinct values in another column. I'm not sure how to phrase that better, so I think it best to illustrate:

Tables

Left one represent the table I have, and the right represents what I want it reduced to. 'Time' column is timestamps, and 'parts' are the name of the devices that are providing the values on each timestamp. In this example there are only 4 distinct parts: A, B, C and D (but there could be more, there could be less).

Every part is supposed to always report a value on a timestamp, but sometimes they don't all do it. I only want to keep the rows where all have reported. When that occur, the number of identical timestamps equals number of unique parts. In this example, that occurs only at t2, t4, t8 and t10.

I know how to get the query to recognize that there are 4 distinct parts, but I don't know how to recognize that there are four t2-s, four t4-s and so on. And then compare that info and make a condition to only give me those. Would it be possible?

So far:

  • I've only created the table at fiddle, but as for an actual query I'm stumped.

  • I'm currently trying to store the number of distinct parts as a variable (to be used for setting up a condition later), but struggling with it. I've seen some PL/pgSQL methods for this, but I can't seem to get them to work. I'm unsure how to use/"place them" alongside a standard Postgres query. I might try using a CTE as "variable" instead.

  • As for the problem of comparing the number of distinct to multiple groups of duplicates within another column, I have no ideas - not even what such a "maneuver" would be called.

3 Answers 3

0

No PL/pgSQL and no CTEs are needed for this. To count the number of distinct parts per timestamp, just use a GROUP BY clause:

SELECT *
FROM testTable
WHERE test_time IN (
  SELECT test_time
  FROM testTable
  GROUP BY test_time
  HAVING COUNT(DISTINCT test_part) = (SELECT COUNT(DISTINCT test_part) FROM testTable)
);

(online demo)

6
  • Some other approaches: dbfiddle.uk/KZzrndWO
    – Bergi
    Commented Jun 13 at 12:32
  • Wow, so simple... Thank you so much! :) If you ever want a tutorial on how to overcomplicate problems in your head, give me a call lol ¯_(ツ)_/¯
    – kreaturen
    Commented Jun 13 at 13:11
  • Thank you for the additional examples as well :)
    – kreaturen
    Commented Jun 13 at 13:14
  • May I ask a follow up about your 2nd alternate solution (The one with nested NOT EXISTS)? I understand what the innermost checks for, but what does the outer one evaluate? Tried tracing what's going on, but since nests grow "so big" and I'm unsure what the outer one actually evaluates, I can't wrap my head around how the whole thing reaches the desired outcome. I suppose it evaluates different iterations of 'p', but I still can't quite grasp it...
    – kreaturen
    Commented Jun 21 at 8:41
  • There should not be in the entire test_table another (fifth, in your example) part for which there exists no reading t3 for the same test_time as that of the t1 row being filtered. Expressed more clearly by flipping quantifiers, it takes only rows t1 where for all (four) different parts in the entire table there does exist a t3 reading on the test_time of t1.
    – Bergi
    Commented Jun 21 at 14:17
0

Steps

  1. Numerate (distinct = dense_rank()) parts for each test_time tn.
  2. Take max number of part for each test_time. This is max tn over test_time maxN.
  3. Take max number of part for each test_time for overall table.
  4. Filter rows (test_time) where maxN=maxSensorCnt.

See example

select *
from ( -- count max number of different parts for each time
       -- and count number of different parts overall
  select *
    ,max(tn)over(partition by test_time) maxN
    ,max(tn)over() maxSensorCnt
  from(  -- numerate different parts (sensors) for each test_time
  select *
    ,dense_rank()over(partition by test_time order by test_part) tn
  from testTable
 )tr
)tm
where maxn=maxsensorcnt

Demo

Other example:

  1. We have sensors list in testTable as
(select string_agg(distinct test_part, ',' order by test_part) SensorList
 from testTable)
  1. We have sensor list for each test_time when grouping by test_time
string_agg(test_part,',' order by test_part) 

So, filter rows

select * 
from testTable
where test_time in 
  ( select test_time from testTable group by test_time
    having string_agg(test_part,',' order by test_part)
      =(select string_agg(distinct test_part, ',' order by test_part) SensorList
        from testTable)
  )

Example

1
  • Really appreciate it. It indroduces new table operations for me, so I will have to study it. It looks very general, and applicable to any number of sensors, regardless of their names, which is great. Thank you :)
    – kreaturen
    Commented Jun 14 at 8:24
0

If we can assume that ...

  • no sensor has more than one entry per timestamp.
  • only the four given sensors in the example can have entries.
  • test_time and test_part are defined NOT NULL

Then the query can be even simpler & cheaper, we only need the row count per test_time:

SELECT (t).*
FROM  (
   SELECT t, count(*) OVER (PARTITION BY test_time) AS ct
   FROM   tbl t
-- ORDER  BY test_time, test_part  -- optional
   ) sub
WHERE  ct = 4;

fiddle

A PK and a CHECK constraint enforce above assumptions:

ALTER TABLE tbl
  ADD PRIMARY KEY (test_time, test_part)
, ADD CONSTRAINT test_part_valid CHECK (test_part IN ('sensor_A', 'sensor_B', 'sensor_C', 'sensor_D'))
;

I select the whole row via table alias t in the subquery and unnest it in the outer SELECT. Saves me the hassle to spell out all columns while excluding the (noise) count from the result.

1
  • Really appreciate it. This indroduces a bunch of new operations for me, and always great to see working examples of that. Never even seen PARTITION BY before. I'm going to try the answers I've been given on a real database of machine data though, and the only thing I cannot assume (or predefine) is the number or sensors, or what they are called. That will vary, depending on machine checked. Still, a really cool answer.
    – kreaturen
    Commented Jun 14 at 8:11

Not the answer you're looking for? Browse other questions tagged or ask your own question.