Using jOOQ’s Implicit Join From Within the JOIN .. ON Clause
Starting with jOOQ 3.11, type safe implicit JOIN have been made available, and they’ve been enhanced to be supported also in DML statements in jOOQ 3.17. Today, I’d like to focus on a somewhat weird but really powerful use-case for implicit JOIN, when joining additional tables from within an explicit JOIN‘s ON clause. The use … Continue reading Using jOOQ’s Implicit Join From Within the JOIN .. ON Clause →
---
Using jOOQ’s Implicit Join From Within the JOIN .. ON Clause
Posted on September 13, 2022 by lukaseder
Starting with jOOQ 3.11, type safe implicit JOIN have been made available, and they’ve been enhanced to be supported also in DML statements in jOOQ 3.17. Today, I’d like to focus on a somewhat weird but really powerful use-case for implicit JOIN, when joining additional tables from within an explicit JOIN‘s ON clause.
The use case
The jOOQ code generator makes heavy use of jOOQ when querying the various dictionary views. In PostgreSQL, most queries go to the SQL standard information_schema, but every now and then, the standard meta data is insufficient, and we also have to query the pg_catalog, which is more complete but also much more technical.
For a lot of information_schema views, there exists an almost equivalent pg_catalog table which contains the same information. For example:
| information_schema | pg_catalog
| schemata | pg_namespace
| tables or user_defined_types | pg_class
| columns or attributes | pg_attribute
Interestingly, PostgreSQL being an ORDBMS, tables and user defined types are the same thing and often interchangeable in the type system, but that’s a topic for a future blog post.
The point of this blog post is that often, when querying a view like information_schema.attributes, we also have to query pg_catalog.pg_attribute to get additional data. For example, in order to find the declared array dimension of a UDT (User Defined Type) attribute, we have to access pg_catalog.pg_attribute.attndims, as this information is nowhere to be found in the information_schema. See also jOOQ feature request #252, where we’ll add support for H2 / PostgreSQL multi dimensional arrays.
So, we might have a UDT like this:
`
CREATE TYPE u_multidim_a AS (
i integer[][],
n numeric(10, 5)[][][],
`
The canonical SQL way to access the pg_attribute table from the attributes view is:
`
SELECT
is_a.udt_schema,
is_a.udt_name,
FROM information_schema.attributes AS is_a
JOIN pg_attribute AS pg_a
ON is_a.attribute_name = pg_a.attname
JOIN pg_class AS pg_c
ON is_a.udt_name = pg_c.relname
AND pg_a.attrelid = pg_c.oid
JOIN pg_namespace AS pg_n
ON is_a.udt_schema = pg_n.nspname
AND pg_c.relnamespace = pg_n.oid
WHERE is_a.data_type = 'ARRAY'
ORDER BY
is_a.udt_schema,
is_a.udt_name,
`
To visualise:
`
+----- udt_schema = nspname ------> pg_namespace
| ^
| |
| oid
| =
| relnamespace
| |
| v
+------- udt_name = relname ------> pg_class
| ^
| |
| oid
| =
| attrelid
| |
| v
is.attributes <-+- attribute_name = attname ------> pg_attribute
`
And now, we can see a few of our integration test user defined types, containing multi dimensional arrays:
`
|udt_schema|udt_name |attribute_name|attndims|
|----------|------------|--------------|--------|
|public |u_multidim_a|i |2 |
|public |u_multidim_a|n |3 |
|public |u_multidim_a|v |4 |
|public |u_multidim_b|a1 |1 |
|public |u_multidim_b|a2 |2 |
|public |u_multidim_b|a3 |3 |
|public |u_multidim_c|b |2 |`
But look at all those JOIN expressions. They’re definitely no fun. We have to spell out the entire path from pg_attribute to pg_namespace, only to make sure we’re not fetching any ambiguously named data from other UDTs or other schemata.
Using implicit joins instead
And that’s where the power of implicit JOIN come in play. What we really want to write in SQL is this:
`
SELECT
is_a.udt_schema,
is_a.udt_name,
-- This table we need
FROM information_schema.attributes AS is_a
-- And also this one
JOIN pg_attribute AS pg_a
ON is_a.attribute_name = pg_a.attname
-- But the path joins from pg_attribute to pg_namespace should
-- be implicit
AND pg_a.pg_class.relname = is_a.udt_name
AND pg_a.pg_class.pg_namespace.nspname = is_a.udt_schema
WHERE is_a.data_type = 'ARRAY'
ORDER BY
is_a.udt_schema,
is_a.udt_name,
`
It’s not that much shorter, but it’s definitely very convenient to no longer have to think about how to join the different steps. Note that unlike other cases, where we used implicit joins via these paths in SELECT or WHERE, this time we’re using them from within a JOIN .. ON clause! In jOOQ, we can write:
`
Attributes isA = ATTRIBUTES.as("is_a");
PgAttribute pgA = PgAttribute.as("pg_a");
ctx.select(
isA.UDT_SCHEMA,
isA.UDT_NAME,
.on(isA.ATTRIBUTE_NAME.eq(pgA.ATTNAME))
.and(isA.UDT_NAME.eq(pgA.pgClass().RELNAME))
.and(isA.UDT_SCHEMA.eq(pgA.pgClass().pgNamespace().NSPNAME))
.where(isA.DATA_TYPE.eq("ARRAY"))
.orderBy(
isA.UDT_SCHEMA,
isA.UDT_NAME,
`
The generated SQL looks slightly different from the original one, as jOOQ’s implicit JOIN algorithm will never flatten the JOIN tree in order to preserve any potential JOIN operator precedence, which is important in the event of there being LEFT JOIN, FULL JOIN or other operators present. The output looks more like this:
`
FROM information_schema.attributes AS is_a
JOIN (
pg_catalog.pg_attribute AS pg_a
JOIN (
pg_catalog.pg_class AS alias_70236485
JOIN pg_catalog.pg_namespace AS alias_96617829
ON alias_70236485.relnamespace = alias_96617829.oid
ON pg_a.attrelid = alias_70236485.oid
ON (
is_a.attribute_name = pg_a.attname
AND is_a.udt_name = alias_70236485.relname
AND is_a.udt_schema = alias_96617829.nspname
`
As you can see, the “readable” table aliases (is_a and pg_a) are the user-provided ones, whereas the “unreadable,” system generated ones (alias_70236485 and alias_96617829) are the ones originating from the implicit JOIN. And, again, it’s important that these implicit joins are embedded right where they belong, with the path root pg_a, from which we started the path expressions. That’s the only way we can retain the correct JOIN operator precedence semantics, e.g. if we had used a LEFT JOIN between is_a and pg_a
Future improvements
In the future, there might be even better JOIN paths that allow for connecting such graphs directly, because every time you have to join information_schema.attributes and pg_catalog.pg_attribute, you’ll have to repeat the same equalities on the (udt_schema, udt_name, attribute_name) tuple, and while implicit JOIN have been helpful, it’s easy to see how this can be further improved. The ideal query would be:
`
SELECT
is_a.udt_schema,
is_a.udt_name,
FROM information_schema.attributes AS is_a
-- Magic here
MAGIC JOIN pg_attribute AS pg_a
ON jooq_do_your_thing
WHERE is_a.data_type = 'ARRAY'
ORDER BY
is_a.udt_schema,
is_a.udt_name,
`
But we’re not quite there yet.
Getting access to these join paths
Neither the information_schema views, nor the pg_catalog tables expose any foreign key meta data, which are a prerequisite for implicit join path expressions and other jOOQ code generation features. This isn’t a huge problem as you can specify synthetic foreign keys to the code generator, for precisely this reason. See also our previous blog post about synthetic foreign keys for information schema queries. In this case, all we need is at least this specification:
`
<configuration>
<generator>
<database>
<fields><field>attrelid</field></fields>
<referencedTable>pg_class</referencedTable>
</foreignKey>
<foreignKey>
<tables>pg_class</tables>
<fields><field>relnamespace</field></fields>
<referencedTable>pg_namespace</referencedTable>
</foreignKey>
</foreignKeys>
</syntheticObjects>
`
And ta-dah, we have our JOIN paths as seen in the previous examples.
- More
#### Like this:
Like Loading...
###
Published by lukaseder
I made jOOQ
View all posts by lukaseder
---
[Original source](https://blog.jooq.org/using-jooqs-implicit-join-from-within-the-join-on-clause/)