Aggregation & grouping¶

Aggregation lets you compute values across many rows instead of fetching them one by one. With yara_orm you build aggregates such as Count, Sum, Avg, Min and Max, attach them to a query as annotations, and optionally group by one or more columns. Everything stays lazy and chainable until you await it, keeping aggregation idiomatic in an async Python ORM.

The models used throughout this guide:

from yara_orm import Model, fields


class Author(Model):
    id = fields.IntField(pk=True)
    name = fields.CharField(max_length=120)


class Book(Model):
    id = fields.IntField(pk=True)
    title = fields.CharField(max_length=200)
    rating = fields.DecimalField(max_digits=3, decimal_places=1, default=0)
    author = fields.ForeignKeyField("Author", related_name="books")

Aggregate functions¶

The aggregate expressions are imported directly from yara_orm:

from yara_orm import Count, Sum, Avg, Min, Max

Aggregate	SQL	Typical target
`Count`	COUNT	a relation (reverse FK / M2M) or column
`Sum`	SUM	a numeric column
`Avg`	AVG	a numeric column
`Min`	MIN	any orderable column
`Max`	MAX	any orderable column

Every aggregate shares the same constructor:

Aggregate(field, distinct=False, _filter=None)

field — the name of a column (e.g. "rating") or a relation (e.g. "books", a reverse foreign key). When the target is a relation, the query compiler adds the necessary JOIN for you. It may also be a column expression (F(...) / arithmetic) or a Case, rendered inline (see below).
distinct — when True, the aggregate counts/aggregates distinct values only, compiling to COUNT(DISTINCT ...).
_filter — an optional Q restricting which rows feed the aggregate, compiling to ... FILTER (WHERE ...) (see Filtered aggregates).

# Count the distinct books linked to each author
Count("books", distinct=True)

Annotating a query¶

.annotate(**annotations) adds computed columns to a query. Each keyword becomes the output name; each value is an aggregate expression.

# Count over a relation (reverse FK): how many books each author has
qs = Author.annotate(book_count=Count("books"))

# Aggregate over a column: the average rating of all books
Book.annotate(avg_rating=Avg("rating"))

You can attach several annotations at once:

Book.annotate(
    avg=Avg("rating"),
    lo=Min("rating"),
    hi=Max("rating"),
)

Reading annotated results¶

Awaiting an annotated queryset returns ordinary model instances, with each annotation set as an attribute named after its keyword:

for author in await Author.annotate(book_count=Count("books")):
    print(author.name, author.book_count)

Projecting with .values()

When you only need the computed numbers (not full model instances), project with .values(...) to get plain dicts, or .values_list(...) for tuples:

rows = await Author.annotate(book_count=Count("books")).values("name", "book_count")
# [{"name": "Ada", "book_count": 2}, {"name": "Bob", "book_count": 1}, ...]

annotate(...).values() with no field list keeps the base columns

Calling .values() with no explicit fields on a pure annotate query returns the base model's columns alongside the annotation, grouped by the primary key:

rows = await Author.annotate(n=Count("books")).values()
# [{"id": 1, "name": "Ada", "n": 2}, {"id": 2, "name": "Bob", "n": 1}, ...]

Grouping with `group_by`¶

.group_by(*fields) groups the result rows by the given columns. Combine it with an annotation and a projection to produce one aggregated row per group:

rows = (
    await Book.annotate(total=Sum("rating"))
    .group_by("author_id")
    .values("author_id", "total")
)
# [{"author_id": 1, "total": Decimal("8.0")}, {"author_id": 2, "total": Decimal("4.0")}]

Aggregating over the whole table

Calling .group_by() with no arguments collapses every row into a single group, which is handy for table-wide statistics:

[row] = (
    await Book.annotate(avg=Avg("rating"), lo=Min("rating"), hi=Max("rating"))
    .group_by()
    .values("avg", "lo", "hi")
)

Grouping and projecting across a relation

group_by(), values() and values_list() accept __-separated forward-relation paths — the related table is joined automatically — and values() can alias them to clean output keys:

rows = (
    await Book.annotate(n=Count("id"))
    .group_by("author__country", "author__name")
    .order_by("-n")
    .values(country="author__country", author="author__name", n="n")
)

Filtering annotations → HAVING¶

yara_orm decides between WHERE and HAVING by what you filter on:

Filtering by a normal field adds a WHERE condition (applied before grouping).
Filtering by an annotation name adds a HAVING condition (applied after the aggregate is computed).

# HAVING COUNT(...) >= 1 — keep only authors that have at least one book
Author.annotate(books=Count("books")).filter(books__gte=1)

Because books is an annotation, the books__gte=1 lookup compiles to HAVING. If you had filtered on a column such as name__startswith="A", that would compile to WHERE instead. The two can be mixed freely in a single .filter(...) call.

exclude() accepts annotation lookups too, compiling to a negated HAVING (exclude(books__gte=1) keeps authors with no books). One exclude(...) call cannot mix annotation and column lookups — negating across the WHERE/HAVING boundary is ambiguous, so that raises FieldError; use two calls instead.

Annotation filters also carry through the write and inspection terminals: delete() / update() restrict through a grouped subquery, and count() counts the rows that survive the HAVING (with group_by(), the groups).

select_related() combines with annotate()

Annotations ride along a select_related() join plan in a single SELECT — the related instances hydrate as usual and each annotation value is set as an attribute:

calls = (
    await Call.filter(org_id=org_id)
    .select_related("contact", "disposition")
    .annotate(duration_rank=RawSQL("RANK() OVER (ORDER BY duration DESC)"))
    .only("id", "to_number", "started")
)
calls[0].contact.name, calls[0].duration_rank  # both loaded

Non-aggregate annotations (window functions, F arithmetic) add no GROUP BY. An aggregate annotation (or a HAVING filter) groups by the base table's primary key plus each joined relation's primary key, so a reverse-relation Count collapses back to one row per base row while the joined columns stay selectable (PostgreSQL's and MySQL's functional-dependency rule; SQLite allows bare columns). select_for_update() still raises UnSupportedError on every annotated shape — grouped results cannot be locked, and even ungrouped annotations may carry a window expression (e.g. via RawSQL), which PostgreSQL also refuses to lock. Note that a RawSQL fragment is opaque: a raw aggregate written as text is not detected as an aggregate (no GROUP BY is added) — use the aggregate classes (Count/Sum/...) for grouped shapes.

Putting it together¶

A realistic report: group authors, count their books, average the ratings, keep only authors with at least one book, and order by the busiest author first.

from yara_orm import Count, Avg

authors = (
    await Author.annotate(
        book_count=Count("books"),
        avg_rating=Avg("books__rating"),
    )
    .filter(book_count__gte=1)        # HAVING COUNT(books) >= 1
    .order_by("-book_count")          # busiest authors first
)

for author in authors:
    print(author.name, author.book_count, author.avg_rating)

Here Avg("books__rating") reaches across the books relation to the related rating column using the relation__column path, filter(book_count__gte=1) becomes a HAVING clause, and order_by("-book_count") sorts by the annotation in descending order.

Scalar functions¶

Beyond aggregates, .annotate(...) accepts scalar functions that compute a value per row. They are imported from yara_orm:

from yara_orm import Lower, Upper, Length, Trim, Concat, Coalesce

Function	SQL	Example
`Lower(field)`	`LOWER(col)`	`Lower("title")`
`Upper(field)`	`UPPER(col)`	`Upper("title")`
`Length(field)`	`LENGTH(col)`	`Length("title")`
`Trim(field)`	`TRIM(col)`	`Trim("title")`
`Concat(*fields)`	`col1 \\|\\| col2 \\|\\| …`	`Concat("title", "author_id")`
`Coalesce(field, default)`	`COALESCE(col, default)`	`Coalesce("title", "untitled")`

for book in await Book.annotate(slug=Lower("title"), n=Length("title")):
    print(book.slug, book.n)

Concat renders with the portable || operator on PostgreSQL and SQLite, and as CONCAT(...) on MySQL (where || is logical OR) — the behaviour is the same on all three.

Composing scalar functions¶

Scalar functions accept F references and other expressions as arguments, and they nest, so you can build compound expressions:

from yara_orm import F, Lower, Coalesce, Concat

Lower(F("name"))                       # LOWER(name) via an F reference
Coalesce(F("published_at"), now)       # fall back to a value expression
Coalesce(Lower("title"), "untitled")  # a function as the first argument
Concat(Lower("title"), "!")            # nest a function inside Concat

Coalesce's fallback is not limited to a literal — it may be an F reference or another function too:

Coalesce(F("nickname"), Upper("name"))

`Random()`¶

Random() renders RANDOM(). Used as an annotation it gives each row a random value you can then order by:

from yara_orm.functions import Random

await Book.annotate(r=Random()).order_by("r").limit(1)   # one random book

The order_by("?") shortcut documented in Querying is the concise form of the same idea.

Conditional expressions: `Case` / `When`¶

Case builds a SQL CASE expression from one or more When arms and an optional default. Each When takes field lookups (like filter) plus a then value; the first matching arm wins.

from yara_orm import Case, When, F

graded = Book.annotate(
    grade=Case(
        When(rating__gte=4, then="high"),
        When(rating__gte=2, then="mid"),
        default="low",
    )
)

# A `then` (or default) can be an F column reference, not just a literal
bonus = Book.annotate(bonus=Case(When(rating__gte=4, then=F("rating")), default=0))

Filtered and conditional aggregates¶

Aggregates accept an optional _filter=Q(...) that restricts which rows feed them, compiling to FILTER (WHERE ...) on PostgreSQL and SQLite (MySQL has no FILTER, so the equivalent AGG(CASE WHEN ... THEN ... END) is rendered there). This is the clean way to count or sum a subset per group without a separate query:

from yara_orm import Count, Sum, Q

# Per author: total books, and how many are highly rated — in one pass
report = (
    await Author.annotate(
        total=Count("books"),
        top=Count("books", _filter=Q(books__rating__gte=4)),
    )
    .group_by("id")
    .values("name", "total", "top")
)

An aggregate can also wrap an expression instead of a bare column — for example a Case, so a conditional value is summed per group:

from yara_orm import Case, When, Value

# Sum 1 for each highly-rated book, 0 otherwise
Book.annotate(high=Sum(Case(When(rating__gte=4, then=1), default=Value(0))))

Value(x) wraps a Python literal so it can stand where an expression is expected — for example a Case default (default=Value(0)) or an operand in F arithmetic — making the intent explicit and unambiguous.

Subquery annotations¶

A Subquery can also be an annotation value, embedding a correlated scalar query as a computed column:

from yara_orm import Subquery

# Annotate each author with the id of a matching book, computed by a subquery
await Author.annotate(
    first_book_id=Subquery(Book.filter(rating__gte=4).values_list("id", flat=True))
)

See Querying for using Subquery as a filter value.

Raw SQL annotations¶

When you need an expression the helpers don't cover, drop to RawSQL — its fragment is spliced into the SELECT verbatim:

from yara_orm import RawSQL

await Book.annotate(double=RawSQL("rating * 2"))