Certainly one of MySQL 8βs largest enhancements is the assist of window capabilities. As I all the time stated in conferences, thereβs SQL earlier than window capabilities and SQL after window capabilities. When you begin utilizing them, youβll use them in all places.
A few of you poor souls are unlucky sufficient to be caught on MySQL 5.7, both of your personal selecting, or since youβre utilizing a clone / fork that’s nonetheless 5.7 appropriate. Whereas for most individuals, this weblog put up is simply in your amusement, or nostalgia, for a few of you this put up can be fairly helpful.
Utilizing native variables
A variety of Stack Overflow questions or weblog posts on the market present the identical previous trick utilizing native variables. In a procedural context, native variables make excellent sense. For instance, this assertion batch.
SET @c = (SELECT COUNT(*) FROM information_schema.tables);
-- Extra processing
-- Return the outcome:
SELECT @c;
A bit hairier is the truth that these native variables could be declared inside a question, and incremented procedurally inside a question:
SELECT
a,
-- Use and increment your variable in SELECT
@rn := @rn + 1
FROM
(
SELECT 3 AS a UNION ALL
SELECT 4 AS a
) AS t,
-- Declare your variable in FROM
(SELECT @rn := 0) r
ORDER BY a;
And growth, you may have a ROW_NUMBER() OVER (ORDER BY a) window perform! The outcome being:
|a |@rn := @rn + 1|
|---|--------------|
|3 |1 |
|4 |2 |
This works fairly by the way, as a result of the expression incrementing the row quantity βoccurs toβ be evaluated within the desired order, row by row, due to the questionβs ORDER BY a clause. Revert it:
SELECT
a, @rn := @rn + 1
FROM (
SELECT 3 AS a UNION ALL
SELECT 4 AS a
) AS t, (SELECT @rn := 0) r
ORDER BY a DESC;
And you continue to get the specified outcome:
|a |@rn := @rn + 1|
|---|--------------|
|4 |1 |
|3 |2 |
That is actually furry, as a result of it violates the concept of SQLβs logical order of operations, which most RDBMS agree upon. It assumes ORDER BY βoccurs earlier thanβ SELECT, simply because the optimiser chooses to do issues this manner. You’ll be able to tamper with the optimiser and break the βcharacteristicβ simply, e.g. by including DISTINCT:
SELECT DISTINCT
a, @rn := @rn + 1
FROM (
SELECT 3 AS a UNION ALL
SELECT 4 AS a
) AS t, (SELECT @rn := 0) r
ORDER BY a DESC;
Now the result’s now not what we needed (how might it presumably be?):
|a |@rn := @rn + 1|
|---|--------------|
|4 |2 |
|3 |1 |
The reason being that DISTINCT is often carried out utilizing a form or a hashmap, each won’t protect any ordering, and in accordance with the aforementioned logical order of operations, that is completely high-quality, as a result of ORDER BY is purported to βoccur afterβ SELECT and after DISTINCT, at the very least logically.
However should youβre cautious, and canopy every part with sufficient exams, you may nonetheless use this trick. In any case, being caught with MySQL 5.7 is already painful sufficient, so why not deal with your self to an βvirtually window performβ.
Word: Simply to point how a lot of a nasty concept relying on this incidental characteristic is, MySQL 8.x now points a deprecation warning:
Setting consumer variables inside expressions is deprecated and can be eliminated in a future launch. Take into account options: βSET variable=expression, β¦β, or βSELECT expression(s) INTO variables(s)β.
The principle cause Iβve seen this syntax getting used on Stack Overflow thus far is to emulate
ROW_NUMBER, so, Iβd say, good riddance (now that MySQL 8 has window perform assist)
PARTITION BY utilizing ORDER BY
What I havenβt seen a lot on Stack Overflow or in blogs, is PARTITION BY assist. Most options Iβve seen use ORDER BY to implement partitioning, which is okay. For instance:
SELECT
a, b,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) AS rn1,
IF (
@prev = a,
@rn := @rn + 1,
CASE WHEN (@prev := a) IS NOT NULL OR TRUE THEN @rn := 1 END
) AS rn2
FROM (
SELECT 1 AS a, 3 AS b UNION ALL
SELECT 2 AS a, 4 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 2 AS a, 6 AS b
) AS t, (SELECT @rn := 0, @prev := NULL) r
ORDER BY a, b DESC;
Producing:
|a |b |rn1|rn2|
|---|---|---|---|
|1 |5 |1 |1 |
|1 |3 |2 |2 |
|2 |6 |1 |1 |
|2 |4 |2 |2 |
A number of notes:
- The specified
PARTITION BYandORDER BYclauses each must be mirrored within the prime stage question. When you solely needed toORDER BY b DESC, notORDER BY aas properly, robust luck. (If you wish to mess around with this, strive eradicating theROW_NUMBER()perform, which additionally orders stuff bya, implicitly) - Iβve tried to place all of the variable project logic right into a single expression so as to keep away from any further columns being generated. This makes the expression a bit extra ugly than it wanted to be.
PARTITION BY utilizing JSON
A extra sturdy, however maybe slower method to emulating PARTITION BY can be to take care of a JSON object that retains monitor of every partition keyβs ROW_NUMBER(), as a result of why not?
Behold this magnificence:
SELECT
a, b,
ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) AS rn1,
json_extract(
@rn := json_set(
@rn, @path := concat('$."', a, '"'),
(coalesce(json_extract(@rn, @path), 0) + 1)
),
@path
) AS rn2,
@rn AS debug -- Added for debugging functions solely
FROM (
SELECT 1 AS a, 3 AS b UNION ALL
SELECT 2 AS a, 4 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 2 AS a, 6 AS b
) AS t, (SELECT @rn := '{}') r
ORDER BY b DESC;
Try the outcomes:
|a |b |rn1|rn2|debug |
|---|---|---|---|--------------------|
|2 |6 |1 |1.0|{"1": 2.0, "2": 1.0}|
|1 |5 |1 |1.0|{"1": 1.0} |
|2 |4 |2 |2.0|{"1": 2.0, "2": 2.0}|
|1 |3 |2 |2.0|{"1": 2.0} |
You’ll be able to do that on MySQL 5.7 (eradicating the ROW_NUMBER(), in fact), and also youβll see this works completely high-quality! How does it work?
- We begin with an empty object
{}within theFROMclause. - On each row that’s by the way ordered by the
ORDER BY b DESCclause, weβll extract the row quantity worth for the partition keyPARTITION BY a. That is finished with a dynamically created JSON path expressionconcat('$."', a, '"'). For instance:$."1"or$."2". - At first, this worth is
NULL, in fact, so we flip it to zero withCOALESCE(., 0) - We add
1to it - Then we
JSON_SETthe worth again into the thing, assigning the outcome again to@rn. - Then, we re-extract the worth weβve simply calculated
This may very well be simplified a bit if it wasnβt only a single expression, however since Iβm fascinated about implementing this emulation in jOOQ (see #14529), I needed to do the train of retaining the projection unchanged (think about, the jOOQ consumer writes ROW_NUMBER() with jOOQ, and needs this to βsimply workβ).
Caveats:
- If the
PARTITION BYclause has a number of expressions, then the composite worth must be used as a key, e.g. utilizing some βnot possibleβ concatenation token (a token that mayβt seem within the information set), or a hash worth (risking collisions, in fact), or a further lookup, making issues fairly difficult. - The
concat('$."', a, '"')expression doesnβt correctly quoteabut, in case it comprises double quotes. - If a number of distinct window perform calculations with distinct
ORDER BYclauses are required, then this method receivedβt work as simply. It is perhaps attainable to calculate issues with one derived desk nest stage per window perform (?). Nevertheless, a number of distinctPARTITION BYclauses are high-quality. Simply generate a separate@rnvariable per distinctPARTITION BYclause. - The JSON doc may lose information kind info. For instance, in JSON, numbers could also be represented as floats, so should you require decimal precision, maybe it’s best to work with JSON strings as a substitute, and forged issues backwards and forwards, all the time valuing correctness over efficiency.
Do you assume youβve seen every part? Letβs do one thing much more furry:
DENSE_RANK with PARTITION BY
We receivedβt cease right here, as a result of as soon as weβve chosen this loopy path, we would as properly see it to the tip. Letβs emulate DENSE_RANK(), which is a bit more durable, making the SQL extra βstunningβ:
SELECT
a, b,
DENSE_RANK() OVER (PARTITION BY a ORDER BY b DESC) AS rn1,
json_extract(
@rn := json_set(@rn,
@rnpath := concat('$."rn-', a, '"'),
(coalesce(json_extract(@rn, @rnpath), 0) + IF (
json_extract(@rn, @prepath := concat('$."pre-v-', a, '"')) = b,
0, 1
)),
@prepath,
b
),
@rnpath
) AS rn2,
@rn AS debug
FROM (
SELECT 1 AS a, 3 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 2 AS a, 6 AS b
) AS t, (SELECT @rn := '{}') r
ORDER BY b DESC;
Right hereβs the outcome:
|a |b |rn1|rn2|debug |
|---|---|---|---|------------------------------------------------------|
|2 |6 |1 |1.0|{"rn-1": 2.0, "rn-2": 1.0, "pre-v-1": 3, "pre-v-2": 6}|
|1 |5 |1 |1.0|{"rn-1": 1.0, "pre-v-1": 5} |
|1 |5 |1 |1.0|{"rn-1": 1.0, "pre-v-1": 5} |
|1 |3 |2 |2.0|{"rn-1": 2.0, "pre-v-1": 3} |
How does it differ?
- We now have to recollect not simply the earlier row quantity worth per partition (
"rn-1","rn-2"), but additionally the earlier worth ofb(theORDER BYstandards) per partition ("pre-v-1","pre-v-2"). - Then, we increment the row quantity per partition provided that the earlier worth is totally different from the present worth
Caveats:
- There can nonetheless be a number of
PARTITION BYexpressions, in addition to path escaping issues, see caveats ofROW_NUMBERfor particulars. - If there are a number of
ORDER BYcolumns, the"pre-v-n"values must bear in mind their composite worth, e.g. by nesting a JSON object. It is a bit easier to take into consideration than a number ofPARTITION BYexpressions
Bushy sufficient? Letβs go deeper
RANK with PARTITION BY
Who would have thought that RANK is more durable than DENSE_RANK (see this text for a direct comparability of the capabilities). Now, along with remembering the earlier ordering worth per partition, we additionally want to recollect the earlier rank per partition (all of the whereas persevering with to depend up the row quantity).
Word, you possibly can refactor this to one thing extra readable should you take away the jOOQ imposed single expression restriction, however the placeβs the problem in that, proper? Right here it’s, bow earlier than it in awe (or terror):
SELECT
a, b,
RANK() OVER (PARTITION BY a ORDER BY b DESC) AS rn1,
coalesce(
json_extract(
@rn := json_set(@rn,
@rnpath := concat('$."rn-', a, '"'),
@currn := coalesce(json_extract(@rn, @rnpath), 0) + 1,
@prevpath := concat('$."pre-v-', a, '"'),
b,
@prernpath := concat('$."pre-rn-', a, '"'),
IF (json_extract(@rn, @prevpath) = b,
coalesce(json_extract(@rn, @prernpath), @currn) div 1,
@currn
)
),
@prernpath
),
@currn
) AS rn2,
@rn AS debug
FROM (
SELECT 1 AS a, 3 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 2 AS a, 6 AS b
) AS t, (SELECT @rn := '{}') r
ORDER BY b DESC;
It produces:
|a |b |rn1|rn2|debug |
|---|---|---|---|----------------------------------------------------------------------------------------|
|2 |6 |1 |1.0|{"rn-1": 3.0, "rn-2": 1.0, "pre-v-1": 3, "pre-v-2": 6, "pre-rn-1": 3.0, "pre-rn-2": 1.0}|
|1 |5 |1 |1.0|{"rn-1": 1.0, "pre-v-1": 5, "pre-rn-1": 1.0} |
|1 |5 |1 |1.0|{"rn-1": 2.0, "pre-v-1": 5, "pre-rn-1": 1.0} |
|1 |3 |3 |3.0|{"rn-1": 3.0, "pre-v-1": 3, "pre-rn-1": 3.0} |
How does it work? βMerelyβ:
Caveats:
PERCENT_RANK and CUME_DIST
Iβm not satisfied that these could be emulated with the native variable primarily based method. In precept:
PERCENT_RANK() OVER wis simply(RANK() OVER w - 1) / (COUNT(*) OVER () - 1)CUME_DIST() OVER wis simply(RANK() OVER w) / (COUNT(*) OVER ())
However as weβll see beneath, itβs not attainable (I feel?) to emulate COUNT(*) OVER () utilizing this native variable primarily based method. You would, perhaps, do one other spherical of calculations when wrapping issues in a derived desk, although.
LEAD, LAG, and many others.
A few of these can be emulated with the above approach, specifically those which might be βbackward wantingβ.
LAG: For instance, withLAG, we simply have to recollect once more the"pre-v-x"for every partition, and produce it once more on the present row. Relying on theLAGβsOFFSET, we would must preserve round a JSON array of values, all the time appending the present worth to the array, and eradicating the primary worth, like in a FIFO queue.LEAD: The ahead wanting capabilities simply must reverse theORDER BYclause. For instance, allLEADcapabilities could be carried out withLAGas properly.FIRST_VALUE: It is a bit easier thanLAG, as we donβt must preserve a whole JSON array of values. We simply bear in mind the primary one, after which preserve reproducing this.LAST_VALUEis once more simply the inverse ofFIRST_VALUEwith reversedORDER BYclause.NTH_VALUEwants a counter per partition, to make sure we catch the Nth worth. Alternatively, we are able to once more retailer every part in a JSON array till it reaches measurement N.IGNORE NULLScould be carried out by skipping all of theNULLvalues from being entered into the aforementioned FIFO queue- Issues get a bit trickier when there’s a
RANGEorROWSclause, in case of which the JSON array / FIFO queue needs to be shifted. This impactsFIRST_VALUEgreater thanLEAD, Iβd say.
The precise implementation is left as an train to the consumer. (Most likely about time to think about upgrading to MySQL 8, by now!)
Combination capabilities
All SQL mixture capabilities could be was window capabilities by appending OVER (). For instance:
SUM(x)is an mixture perform, aggregating information per group generated by theGROUP BYclause, shared by your entire question.SUM(x) OVER ()is the corresponding window perform, aggregating information per partition generated by thePARTITION BYclause per window perform (or relatively, per specific or implicit window specification)
Since these beforehand mentioned native variable primarily based approaches are row-by-row primarily based calculations, I donβt assume itβs attainable to emulate partition extensive mixture window capabilities, as a result of these require with the ability to take a look at your entire partition, together with rows that havenβt but been projected.
Nevertheless (by no means hand over!), some window frames could be emulated additionally for mixture capabilities, particularly the backward wanting ones. For simplicity, Iβll simply strive emulating this:
SUM(b) OVER (
PARTITION BY a
ORDER BY b DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
Word: with out specific window body,
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWis implicit, and that signifies that so as to embody tied rows within the sum, weβd must once more be ahead wanting, which I donβt assume is feasible with the native variable row-by-row primarily based method.Nevertheless, it is perhaps attainable to emulate different backward wanting
ROWSframes. That train is once more left to the reader.
So, letβs do that:
SELECT
a, b,
SUM(b) OVER w AS sum1,
json_extract(
@w := json_set(@w,
@spath := concat('$."s-', a, '"'),
(coalesce(json_extract(@w, @spath), 0) + b),
@cpath := concat('$."c-', a, '"'),
(coalesce(json_extract(@w, @cpath), 0) + 1)
),
@spath
) AS sum2,
COUNT(*) OVER w AS cnt1,
json_extract(@w, @cpath) AS cnt2,
AVG(b) OVER w AS avg1,
json_extract(@w, @spath) / json_extract(@w, @cpath) AS avg2,
@w AS debug
FROM (
SELECT 1 AS a, 3 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 2 AS a, 6 AS b
) AS t, (SELECT @w := '{}') r
WINDOW w AS (
PARTITION BY a
ORDER BY b DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
ORDER BY b DESC;
The output appears right:
|a |b |sum1|sum2|cnt1|cnt2|avg1 |avg2 |debug |
|---|---|----|----|----|----|------|------------|-------------------------------------------------|
|2 |6 |6 |6.0 |1 |1.0 |6 |6 |{"c-1": 3.0, "c-2": 1.0, "s-1": 13.0, "s-2": 6.0}|
|1 |5 |5 |5.0 |1 |1.0 |5 |5 |{"c-1": 1.0, "s-1": 5.0} |
|1 |5 |10 |10.0|2 |2.0 |5 |5 |{"c-1": 2.0, "s-1": 10.0} |
|1 |3 |13 |13.0|3 |3.0 |4.3333|4.3333333333|{"c-1": 3.0, "s-1": 13.0} |
Notes:
- Iβve saved all of the window calculations in the identical JSON object, assuming all of them share the identical window specification, to allow them to reuse their values (e.g.
AVG(x) = SUM(x) / COUNT(x), and thusAVG(x) OVER w = SUM(x) OVER w / COUNT(x) OVER w) - Apart from that, issues work just about identical to for
ROW_NUMBER()
Conclusion
This has been a enjoyable weblog put up to put in writing. I hope it was useful to you both as an train to consider what window capabilities actually do, or within the worst case, that will help you poor soul truly implement issues this manner on MySQL 5.7.
There have been a whole lot of caveats. This emulation method doesnβt all the time work and makes (heavy) assumptions about your question. For instance:
- You’ll be able toβt use
DISTINCT - You’ll be able toβt use arbitrary
ORDER BYclauses that donβt match the window performβs - You’ll be able toβt use a number of window capabilities with totally different window specs
- You’ll be able toβt use ahead wanting window frames (together with frameless mixture window capabilities that mixture your entire partition)
There are in all probability extra caveats that havenβt been mentioned right here. When youβre diligent, and check issues closely, nonetheless, you may be capable to pull off utilizing these approaches. Good luck (and donβt blame me π )









