Estafette
Compose Login
You are browsing eu.zone1 in read-only mode. Log in to participate.
rss-bridge 2022-06-10T14:32:38+00:00

How to Filter a SQL Nested Collection by a Value

I stumbled upon a very interesting question on Stack Overflow about how to use jOOQ’s MULTISET operator to nest a collection, and then filter the result by whether that nested collection contains a value. The question is jOOQ specific, but imagine, you have a query that nests collections using JSON in PostgreSQL. Assuming, as always, … Continue reading How to Filter a SQL Nested Collection by a Value →

---

How to Filter a SQL Nested Collection by a Value

Posted on June 10, 2022June 10, 2022 by lukaseder

I stumbled upon a very interesting question on Stack Overflow about how to use jOOQ’s MULTISET operator to nest a collection, and then filter the result by whether that nested collection contains a value.

The question is jOOQ specific, but imagine, you have a query that nests collections using JSON in PostgreSQL. Assuming, as always, the Sakila database. Now, PostgreSQL doesn’t support the SQL standard MULTISET operator, but we can use ARRAY, which works almost the same way.

`

SELECT
f.title,
ARRAY(
JOIN film_actor AS fa USING (actor_id)
WHERE fa.film_id = f.film_id
ORDER BY a.actor_id
FROM film AS f

`

This produces all films and their actors as follows (I’ve truncated the arrays for readability purposes. You get the point):

`
title |array
---------------------------+--------------------------------------------------------------------------------------
ACADEMY DINOSAUR |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)"
ACE GOLDFINGER |{"(19,BOB,FAWCETT)","(85,MINNIE,ZELLWEGER)","(90,SEAN,GUINESS)","(160,CHRIS,DEPP)"}
ADAPTATION HOLES |{"(2,NICK,WAHLBERG)","(19,BOB,FAWCETT)","(24,CAMERON,STREEP)","(64,RAY,JOHANSSON)","(1
AFFAIR PREJUDICE |{"(41,JODIE,DEGENERES)","(81,SCARLETT,DAMON)","(88,KENNETH,PESCI)","(147,FAY,WINSLET)"
AFRICAN EGG |{"(51,GARY,PHOENIX)","(59,DUSTIN,TAUTOU)","(103,MATTHEW,LEIGH)","(181,MATTHEW,CARREY)"
AGENT TRUMAN |{"(21,KIRSTEN,PALTROW)","(23,SANDRA,KILMER)","(62,JAYNE,NEESON)","(108,WARREN,NOLTE)",
AIRPLANE SIERRA |{"(99,JIM,MOSTEL)","(133,RICHARD,PENN)","(162,OPRAH,KILMER)","(170,MENA,HOPPER)","(185
AIRPORT POLLOCK |{"(55,FAY,KILMER)","(96,GENE,WILLIS)","(110,SUSAN,DAVIS)","(138,LUCILLE,DEE)"}
ALABAMA DEVIL |{"(10,CHRISTIAN,GABLE)","(22,ELVIS,MARX)","(26,RIP,CRAWFORD)","(53,MENA,TEMPLE)","(68,
`

Now, the question on Stack Overflow was, how to filter this result by whether the ARRAY (or MULTISET) contains a specific value.

#### Filtering the ARRAY

We can’t just add a WHERE clause to the query. Because of the logical order of operations in SQL, the WHERE clause “happens before” the SELECT clause, so the ARRAY is not yet available to WHERE. We could, however, wrap everything in a derived table and do this, instead:

`

SELECT *
FROM (
SELECT
JOIN film_actor AS fa USING (actor_id)
WHERE fa.film_id = f.film_id
ORDER BY a.actor_id
) AS actors
SELECT ROW(a.actor_id, a.first_name, a.last_name)
FROM actor AS a
WHERE a.actor_id = 1
ORDER BY f.title

`

Excuse the unwieldy ARRAY @> ARRAY operator. I’m not aware of a better approach here, because it’s hard to unnest a structurally typed RECORD[] array in PostgreSQL, if we don’t use a nominal type (CREATE TYPE ...). If you know a better way to filter, please let me know in the comments section. Here’s a better version:

`

SELECT *
FROM (
SELECT
JOIN film_actor AS fa USING (actor_id)
WHERE fa.film_id = f.film_id
ORDER BY a.actor_id
) AS actors
FROM unnest(actors) AS t (a bigint, b text, c text)
WHERE a = 1
ORDER BY f.title

`

Anyway, this produces the desired result:

`
title |actors
---------------------+-------------------------------------------------------------------------------------------------
ACADEMY DINOSAUR |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)","(40,JOHNN
ANACONDA CONFESSIONS |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(22,ELVIS,MARX)","(150,JAYNE,NOLTE)","(164,HUMPHREY
ANGELS LIFE |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(7,GRACE,MOSTEL)","(47,JULIA,BARRYMORE)","(91,CHRIS
BULWORTH COMMANDMENTS|{"(1,PENELOPE,GUINESS)","(65,ANGELA,HUDSON)","(124,SCARLETT,BENING)","(173,ALAN,DREYFUSS)"}
CHEAPER CLYDE |{"(1,PENELOPE,GUINESS)","(20,LUCILLE,TRACY)"}
COLOR PHILADELPHIA |{"(1,PENELOPE,GUINESS)","(106,GROUCHO,DUNST)","(122,SALMA,NOLTE)","(129,DARYL,CRAWFORD)","(163,CH
ELEPHANT TROJAN |{"(1,PENELOPE,GUINESS)","(24,CAMERON,STREEP)","(37,VAL,BOLGER)","(107,GINA,DEGENERES)","(115,HARR
GLEAMING JAWBREAKER |{"(1,PENELOPE,GUINESS)","(66,MARY,TANDY)","(125,ALBERT,NOLTE)","(143,RIVER,DEAN)","(155,IAN,TANDY
`

Now, all the results are guaranteed to be films in which 'PENELOPE GUINESS' was an ACTOR. But is there a better solution?

#### Using ARRAY_AGG instead

However, in native PostgreSQL, it would be better (in this case) to use ARRAY_AGG, I think:

`

SELECT
f.title,
ARRAY_AGG(ROW(
) ORDER BY a.actor_id) AS actors
FROM film AS f
JOIN film_actor AS fa USING (film_id)
JOIN actor AS a USING (actor_id)
GROUP BY f.title
HAVING bool_or(true) FILTER (WHERE a.actor_id = 1)
ORDER BY f.title

`

This produces the exact same result:

`
title |actors
---------------------+------------------------------------------------------------------------------------------------
ACADEMY DINOSAUR |{"(1,PENELOPE,GUINESS)","(10,CHRISTIAN,GABLE)","(20,LUCILLE,TRACY)","(30,SANDRA,PECK)","(40,JOHN
ANACONDA CONFESSIONS |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(22,ELVIS,MARX)","(150,JAYNE,NOLTE)","(164,HUMPHRE
ANGELS LIFE |{"(1,PENELOPE,GUINESS)","(4,JENNIFER,DAVIS)","(7,GRACE,MOSTEL)","(47,JULIA,BARRYMORE)","(91,CHRI
BULWORTH COMMANDMENTS|{"(1,PENELOPE,GUINESS)","(65,ANGELA,HUDSON)","(124,SCARLETT,BENING)","(173,ALAN,DREYFUSS)"}
CHEAPER CLYDE |{"(1,PENELOPE,GUINESS)","(20,LUCILLE,TRACY)"}
COLOR PHILADELPHIA |{"(1,PENELOPE,GUINESS)","(106,GROUCHO,DUNST)","(122,SALMA,NOLTE)","(129,DARYL,CRAWFORD)","(163,C
ELEPHANT TROJAN |{"(1,PENELOPE,GUINESS)","(24,CAMERON,STREEP)","(37,VAL,BOLGER)","(107,GINA,DEGENERES)","(115,HAR
GLEAMING JAWBREAKER |{"(1,PENELOPE,GUINESS)","(66,MARY,TANDY)","(125,ALBERT,NOLTE)","(143,RIVER,DEAN)","(155,IAN,TAND
`

How does it work?

- We’re grouping by FILM and aggregate the contents per film into a nested collection.
- We can now use HAVING to filter on groups.
- BOOL_OR(TRUE) is TRUE as soon as the GROUP is non-empty
- FILTER (WHERE a.actor_id = 1) was that filter criteria, which we place in the group

So, the HAVING predicate is TRUE if there is at least one ACTOR_ID = 1, or NULL otherwise, which has the same effect as FALSE. If you’re a purist, wrap the predicate in COALESCE(BOOL_OR(...), FALSE)

Clever or neat, or a bit of both?

#### Doing this with jOOQ

Here’s the jOOQ version, that works on any RDBMS that supports MULTISET_AGG (ARRAY_AGG emulation is still pending):

`

ctx.select(
FILM_ACTOR.film().TITLE,
multisetAgg(
FILM_ACTOR.actor().FIRST_NAME,
FILM_ACTOR.actor().LAST_NAME))
.from(FILM_ACTOR)
.groupBy(FILM_ACTOR.film().TITLE)
.having(boolOr(trueCondition())
.filterWhere(FILM_ACTOR.actor().ACTOR_ID.eq(1)))
.orderBy(FILM_ACTOR.film().TITLE)
.fetch();

`

While the powerful MULTISET value constructor gets most of the fame with jOOQ users, let’s not forget there is also a slightly less powerful, but occasionally really useful MULTISET_AGG aggregate function, which can be used for aggregations or as a window function!

- More

#### Like this:

Like Loading...

###
Published by lukaseder

I made jOOQ
View all posts by lukaseder

---

[Original source](https://blog.jooq.org/how-to-filter-a-sql-nested-collection-by-a-value/)

Reply