Blog Post

Random SQL Review: DH.N Photo of the Day

Random SQL Review: DH.N Photo of the Day

Well, this is my third one of these Random SQL Reviews, so I guess we can call it a series, though I’m still not sure how regular they’ll be.

Today I want to take a look at something new over at DetroitHockey.Net. I decided recently that I wanted to have a little more automatically-generated social media content, specifically for the site’s Instagram feed, and developed a “Photo of the Day” feature.

This feature randomly selects a photo from the DH.N photo gallery, does some stuff on the PHP with the GD library to manipulate it into an appropriate size for sharing to social media, and then shares it out.  It runs every-other Thursday with the #ThrowbackThursday hashtag, so as to not be too overwhelming.

Maybe I’ll talk about the PHP side of the process later but right now I want to look at the query that drives this because it’s not as simple as it sounds.

I’ve talked about DH.N’s photo gallery before but the important context for this case is that the “photos” schema includes a photo table with metadata about the photo itself, the photo_collection table says which collections a photo belongs to (with collections having a type that can be a generic collection, a team, a player, or a country), the collection table has information about generic collections (including whether that collection is allowed to be included in the Photo of the Day), the photo_of_the_day table is a record of past selections for POTD, and data about the actual photo files are in DH.N’s asset management system – the asset table in the “core” schema.

Thus far I’ve said that this process chooses a random photo but it’s more than that.  We want to exclude any photos that were taken in the last three years and any photos that were selected as the POTD in the last five years (repeats are fine in general, as long as there’s sufficient time between them).  We also try to choose a photo that was taken on this date (or close to this date) in a previous year.  And there are some requirements around the dimensions of the photo in an effort to make sure we get high-enough quality photos for sharing.

Let’s jump right in…

$query = '
	SELECT
		photo_sorted.*,
		asset_photo.asset_path asset_path_photo,
		asset_watermarked.asset_path asset_path_watermarked
	FROM (
		SELECT photo.*
		FROM photo
			INNER JOIN (
				SELECT DISTINCT photo.photo_id
				FROM photo
					INNER JOIN photo_collection ON photo.photo_id = photo_collection.photo_id
					LEFT JOIN collection ON ((photo_collection.collection_id = collection.collection_id) AND (photo_collection.collection_type = :collection_type_general) AND (collection.collection_allow_random = 1))
					LEFT JOIN (
						SELECT
							photo_id,
							MAX(potd_timestamp) potd_recent_timestamp
						FROM photo_of_the_day
						GROUP BY photo_id
					) potd_recent ON photo.photo_id = potd_recent.photo_id
				WHERE ((photo_height >= :min_size) OR (photo_width >= :min_size)) AND ((collection.collection_id IS NOT NULL) OR ((photo_collection.collection_type = :collection_type_team) AND (photo_collection.collection_id IN (' . $db->buildInString($team_ids, 'team_id') . ')))) AND ((potd_recent_timestamp IS NULL) OR (TIMESTAMPDIFF(YEAR, FROM_UNIXTIME(potd_recent_timestamp), NOW()) > 5) AND (TIMESTAMPDIFF(YEAR, photo_date, NOW()) > 3))
			) photo_eligible ON photo.photo_id = photo_eligible.photo_id
		ORDER BY ABS(DATEDIFF(CONCAT_WS("-", YEAR(photo.photo_date), DATE_FORMAT(NOW(), "%m-%d")), photo.photo_date)) ASC
		LIMIT 10) photo_sorted
		INNER JOIN ' . $db->names->getCore() . '.asset asset_photo ON photo_sorted.asset_id_photo = asset_photo.asset_id
		INNER JOIN ' . $db->names->getCore() . '.asset asset_watermarked ON photo_sorted.asset_id_watermarked = asset_watermarked.asset_id
	ORDER BY RAND()
	LIMIT 1';

As I’ve previously done, I’m showing this wrapped in PHP to highlight the cross-schema joins. Additionally, I’m using another method from my custom PDO connection class that I’ll touch on momentarily.

I’m going to work from the inside out on this one.  And, for the first time, I’m going to highlight snippets as we go along.

SELECT
	photo_id,
	MAX(potd_timestamp) potd_recent_timestamp
FROM photo_of_the_day
GROUP BY photo_id

We can see that, in the middle of all of that, there’s a query that gets the most recent timestamp for each POTD. We have to get the most recent because POTDs technically can repeat, though I’m not sure how likely that is at this point given the five-year wait I mentioned above.

We join onto that set of data in an effort to get the set of eligible photos.

SELECT DISTINCT photo.photo_id
FROM photo
	INNER JOIN photo_collection ON photo.photo_id = photo_collection.photo_id
	LEFT JOIN collection ON ((photo_collection.collection_id = collection.collection_id) AND (photo_collection.collection_type = :collection_type_general) AND (collection.collection_allow_random = 1))
	LEFT JOIN (
		SELECT
			photo_id,
			MAX(potd_timestamp) potd_recent_timestamp
		FROM photo_of_the_day
		GROUP BY photo_id
	) potd_recent ON photo.photo_id = potd_recent.photo_id
WHERE ((photo_height >= :min_size) OR (photo_width >= :min_size)) AND ((collection.collection_id IS NOT NULL) OR ((photo_collection.collection_type = :collection_type_team) AND (photo_collection.collection_id IN (' . $db->buildInString($team_ids, 'team_id') . ')))) AND ((potd_recent_timestamp IS NULL) OR (TIMESTAMPDIFF(YEAR, FROM_UNIXTIME(potd_recent_timestamp), NOW()) > 5) AND (TIMESTAMPDIFF(YEAR, photo_date, NOW()) > 3))

This is the guts of the query.

We’re selecting a set of distinct photo_ids from the photo table. They need to be distinct because we’re joining onto the photo_collection table and photos can be in more than one collection. We also join onto the collection table but this is a left join because it’s possible for a photo to be in a collection but not in a generic collection, and we only want to get generic collections that are allowed to be used in a POTD (hence the collection_allow_random check).  Then we join onto that set of previous POTDs we just looked at.

With all of that data available, we can start filtering.  In our where clause, we immediately check that either the photo height or width meets the minimum requirements. Then we confirm that the photo has a generic collection ID (which has already been filtered for allowed generic collections) or is assigned to a team collection that matches a set of team IDs.  The latter part of that uses the custom PDO::buildInString() method I mentioned above, with the specified team IDs stashed away in the $team_ids variable over on the PHP side.  Then we want a photo that has either never been a POTD (therefore has a null POTD timestamp) or it’s most recent timestamp was more than five years ago.  Finally, we check that the photo itself is more than three years old.

This gets us all of the eligible photo IDs.  They’re the right size, they belong to at least one of the right collections, they’re old enough, and they haven’t been a POTD recently.  We could just randomly grab one of these except for wanting that “on this date” (or at least close to it) functionality.

To get that, we select from the photo table, join onto this set of eligible IDs, and have some fun in the order by clause.

ORDER BY ABS(DATEDIFF(CONCAT_WS("-", YEAR(photo.photo_date), DATE_FORMAT(NOW(), "%m-%d")), photo.photo_date)) ASC
LIMIT 10

We do some string concatenation to get us today’s date in the year that the photo was taken (for example, if this process ran on September 26, 2024, and the photo was from November 20, 1999, the output would be “1999-09-26”).  Then we compare that date to the photo date using DATEDIFF to determine how far the dates are apart (and then we get the absolute value of that number because we don’t care which direction the difference is in).  We order by that number and get the top ten.  This gives us ten eligible photos that are particularly appropriate for the current date.

Finally, with the photo information in hand, we do our previously-mentioned cross-schema joins into the asset management system, getting the path to the photo file and the watermarked version of the photo file.  We randomize the ten photos and limit the response to one of them, and that’s our Photo of the Day.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.