SQL Group with Most Recent Record Each

SQL Group with Most Recent Record Each

Say you have a table with student test data.

SELECT      t.*
FROM        tests t
ORDER BY    t.student_id,
            t.taken_on
student_idscoretaken_on
117502021-08-30
117172021-11-16
117862022-02-08
215502021-08-30
214932021-11-16
216272022-02-08
314412021-08-30
315282022-02-07

Imagine that you want to see the most recent score each student received.

To put it another way, you want to query this table and see only 1 row per student_id. The row should be the one with the most recent taken_on.

The desired result looks like this:

student_idscoretaken_on
117862022-02-08
216272022-02-08
315282022-02-07

Let's start simple. You need to get a query that shows the most recent taken_on grouped by student_id first.

SELECT      student_id,
            MAX(taken_on) latest_taken_on
FROM        tests
GROUP BY    student_id

Result:

student_idlatest_taken_on
12022-02-08
22022-02-08
32022-02-07

You could also add things like a WHERE clause to that query if you need only the most recent test where a certain condition was met.

Next, you should revisit the original query.

SELECT      t.*
FROM        tests t
ORDER BY    t.student_id,
            t.taken_on

But this time, JOIN the "latest date" query you just wrote. You'll want to join on the student_id and taken_on date columns.

SELECT      t.*
FROM        tests t
JOIN        (
  SELECT      student_id,
              MAX(taken_on) latest_taken_on
  FROM        tests
  GROUP BY    student_id
            ) l
ON          t.student_id = l.student_id
AND         t.taken_on = l.latest_taken_on
ORDER BY    t.student_id

Those JOIN parameters ensure the query only returns rows where the student_id and taken_on columns match the latest date for each student.

Note that t.taken_on in the ORDER BY clause is no longer necessary because you're already ordering on t.student_id and there will only be one each.

student_idscoretaken_on
117862022-02-08
216272022-02-08
315282022-02-07

That's it. The results are exactly what we were aiming for; the row with the latest taken_on for each student_id is returned.