Prompted by a Podcast, I promptly prompted Copilot

04 Sep, 2024

In Lex Fridman's latest podcast episode, he suggested leveraging code generation tools to produce code using mainly natural language prompts. I don't think this is a great way to maintain existing code, but it is a fantastic way to hack together something new, quickly.

I wanted to make a collage, day by day, of the podcast episodes I listened to. The app I use, Podcast Addict, stores all of my listening data in a SQLite database. I don't know the first thing about SQLite, but, for my purposes, I didn't really need to.

Table Structure
Creating the Query
- Prompts to achieve this
Organizing the information
Downloading the images
Making the collage
Takeaways

Table Structure

I used the DBeaver application to explore the database and understand the table structure¹.

episodes table
- Contains all the episodes the app had ever fetched.
- Of those, the ones I'd completed had a playbackDate > 0.
- If there is custom artwork per episode, the thumbnail_id column is populated. Otherwise it's -1.
- Has a podcast_id field that relates to the podcasts table.
podcasts table
- Contains all the podcasts the app had ever fetched.
- I can match the podcast ID to the podcast name.
- Thumbnail ID is populated for each podcast.
bitmaps table
- All of thumbnail_id values match with an entry in this table.
- The URL to the image is stored here.

Creating the Query

Seems easy enough. I asked Copilot to make me a query, and it did. I could even iterate on it to get exactly the behaviour I wanted: if an episode didn't have custom artwork, return the podcast's artwork instead.

Prompts to achieve this

In Python, connect to the SQLite database /home/kaushik/podcast_collage/podcastAddict.db. Add comments for any dependencies needed. Run a SELECT query on the episodes table and receive the outputs.
Create a SQL query to achieve this:
- Get the name, URL, playbackDate, duration_ms from episodes.
- Join the episodes and podcasts tables using podcast_id.
- Join the episodes and bitmaps tables using the thumbnail_id field.
- Filter by playbackDate > 0
This query is discarding rows where the join cannot be performed. I want all the rows. If there is no bitmap, that column should just be NULL
I want to match either the thumbnail_id from table x or the thumbnail_id from table p and store b.url for that ID in a column. This left join should have 2 conditions. Either x.thumbnail_id matches b._id OR p.thumbnail_id matches b._id

Organizing the information

As a `NamedTuple`

I had Copilot then wrap this query in a Python script that runs the query and packages the results into a NamedTuple. It wrote some neat code to create those records. I particularly liked doing all the data transformations in the from_row class method.

I used the following prompts:

Convert the playbackDate (which is a Unix epoch in milliseconds) into a human-readable date (discard the time). Convert the duration ms into a time object. Store the data in a NamedTuple for easier access
Encapsulate the conversion functions in the constructor of the NamedTuple. You may want to use the class version
Can you use typing.NamedTuple to clean this up

And it generated this code:

# Define a NamedTuple to store the data
class Episode(NamedTuple):
    episode_name: str
    podcast_name: str
    url: str
    date: datetime.date
    duration: datetime.timedelta
    img_url: str

    @classmethod
    def from_row(cls, row):
        episode_name, podcast_name, url, playback_date, duration_ms, img_url = row
        # Convert playback_date to a human-readable date
        date = datetime.datetime.fromtimestamp(playback_date / 1000).date()
        # Convert duration_ms to a time object
        duration = datetime.timedelta(milliseconds=duration_ms)
        return cls(
            episode_name, podcast_name, url, date, duration, img_url
        )


# Convert the rows into Episode objects
episodes = [Episode.from_row(row) for row in rows]

As a `DataFrame`

Unfortunately, it turned out that I didn't actually need a NamedTuple. Perhaps I shouldn't have been so prescriptive.

Since I wanted to group the data by date, I needed a Pandas DataFrame instead.

These prompts did the trick:

For every date in September 2024, I want the podcast names and episodes listened to.
Would this be easier to do if the data was organized in a DataFrame first?

Filtering and Grouping

I also had to filter date ranges since the database had records starting from 2019. While I was at it, might as well have Copilot clean up the code a little bit:

Create a function to perform filtering and grouping. Filtering should take a date range. The end date should default to the current date.

Traceback (most recent call last):
File "/home/kaushik/podcast_collage/make_collage.py", line 68, in <module>
  september_2024_episodes = df[df["date"].dt.year == 2024][df["date"].dt.month == 9]
...
AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?

UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  september_2024_episodes = df[df["date"].dt.year == 2024][df["date"].dt.month == 9]

TypeError: Invalid comparison between dtype=datetime64[ns] and NoneType

Downloading the images

Easy enough to achieve with these prompts:

For each of these episodes, download the image from the URL into a local directory with the date. Ignore ones that cause this error:

requests.exceptions.MissingSchema: Invalid URL '/storage/emulated/0/Android/data/com.bambuna.podcastaddict/files/podcast/Cautionary Tales/Tim_s_Tolkien_Obsession_Amazon_Prime_s_The_Rings_of_Power_24655.mp3': No scheme supplied.

Remove entries from the group where the img_url column doesn't start with http

Making the collage

Finally, the big one. This took Copilot a little bit, but it got there in the end:

Make a collage of these images per day
Dynamically adjust the width and height for the collage. Say, if there are 4 images, they should be in a 2x2 grid. 2 images should be side by side, etc.
Position is being calculated incorrectly. Nothing ever goes into the second row.
This gets it wrong when num_images is a perfect square. 4 images should be in a 2x2 grid
Redo this computation. A single image should be of size 100x100. Dynamically adjust the size of the canvas to arrange images of that size, based on the number of images in the group.

And voila! Here's my collages:

alt text

Takeaways

I can sense the trepidation programmers feel at the prospect of Copilot completely replacing human programming. But I don't think that's what's going to happen.

In this case, I got a sweet little thing made in a couple of hours that I never would've tried doing in the first place. Without Copilot, I would've had to:

pore over SQL documentation to figure out how to structure my query
understand how to connect to a database with SQLite
parse all the data into the appropriate structures
figure out how to use the requests library to download image data from a URL
find a library to put a collage together (I didn't think of PIL until Copilot suggested it)
align the images on a PIL canvas to create the collage.

None of those individual things is particularly challenging. I could do any of them by hand. But I would've needed to figure each piece out, one by one, and put them all together.

All things considered, I ended up building something. It's local, it's for myself, but it's real and it works ². That, I think, is the promise of these AI tools: to empower people to build things.

This link helped me make sense of Podcast Addict's table structure.↩
And if you're interested in looking at Copilot's handiwork, look here.↩