How to represent GROUP BY with HAVING COUNT(*)>1 in relational algebra?

Question 1

How to represent GROUP BY with HAVING COUNT(*)>1 in relational algebra?

sql relational-algebra

Pierre-Antoine Guillaume · Aug 27, 2017 · Viewed 11.2k times · Source

Answer

Answer

There are many variants of "relational algebra", differing even on what a relation is. You need to tell us which one you are supposed to use.

Also you don't explain what it means for a pair of RA & SQL queries to "have the form of" or "be the same as" each other. (Earlier versions.) Same result? Or also some kind of parallel structure?

Also you don't explain what "get the list of clients" means. What attributes does the result have?

If you try to write a definition of the count you are trying to use in σ count(IdClient)>1 (...)--what it inputs & what it outputs based on that--you will see that you can't. That kind of count that takes just an attribute does not correspond to a relational operator. It is used in a grouping expression--which you are missing. Such count & group are not actually relational operators, they are non-terminals in so-called relational algebras that are really query languages, designed by SQL apologists, suggesting it is easy to map SQL to relational algebra, but begging the question of how we aggregate in an algebra. Still, maybe that's the sort of "relational algebra" you were told to use.

I saw the use of count() there https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra

The nature of algebras is that the only sense in which we "use" operators "with" other operators is to pass outputs of operator calls as inputs to other operator calls. (Hence some so-called algebras are not.) In your linked answer, grouping operator G inputs aggregate name count and attribute name name and that affects the output. The answer quotes Database System Concepts, 5th Ed:

 G1, G2, ..., Gn G F1(A1), F2(A2), ..., Fm(Am) (E)
where E is any relational-algebra expression; G1, G2, ..., Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an attribute name.

G returns rows with attributes G1, ..., A1, ... where one or more rows with the same G1, ... subrows are in E and each Ai holds the output from aggregating Fi on Ai over those rows.

But the answer when you read & linked it used that definition improperly. (I got it fixed since.) Correct is:

π name (σ phone>1 (name G count(phone) (Person)))

This is clear if you carefully read the definition.

G has misleading syntax. count(phone) is not a call of an operator; it's just a pair of arguments, an aggregate name count & an attribute name phone. Less misleading syntax would be

π name (σ phone>1 (name G count phone (Person)))

One does not need a grouping operator to write your query. That makes it all the more important to know what "relational algebra" means in the exam. It is harder if you can't use a grouping operator.

"Π Rental seems like a shady business" is unclear. You do use projection incorrectly; proper use is π attributes (relation). I guess you are using π in an attempt to involve a grouping operator like G. Re "the logic behind the math" see this.

Question 2

For an exam, I am asked to get the list of clients having more than one rent, both as an SQL query and as an algebraic expression.

For some reasons, the correction doesn't provide the algebraic version.

So now I am left with:

SELECT IdClient, Name, ...
FROM Client
WHERE IdClient IN (
    SELECT IdClient 
    FROM Rental
    GROUP BY IdClient
    HAVING COUNT(*) > 1
)

I don't know if there is a standard for algebraic notations, therefore:

Π projection
× Cartesian product
⋈ natural join
σ selection

Then I translate as:

Π IdClient, Name, ... (
    σ (count(IdClient)>1) (Π Rental) ⋈ (Client ⋈ Rental)
)

But I find no source to prove me right or wrong, especially for:

the logic behind the math
Π Rental seems like a shady business

I saw the use of count() at https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra and while it isn't used the same way, I couldn't figure out a way to use it without the projection (which I wanted to avoid.)

How to represent GROUP BY with HAVING COUNT(*)>1 in relational algebra?

Answer

Related questions