Prompted by a Podcast, I promptly prompted Copilot
In Lex Fridman's latest podcast episode, he suggested leveraging code generation tools to produce code using mainly natural language prompts. I don't think this is a great way to maintain existing code, but it is a fantastic way to hack together something new, quickly.
I wanted to make a collage, day by day, of the podcast episodes I listened to. The app I use, Podcast Addict, stores all of my listening data in a SQLite database. I don't know the first thing about SQLite, but, for my purposes, I didn't really need to.
- Table Structure
- Creating the Query
- Organizing the information
- Downloading the images
- Making the collage
- Takeaways
Table Structure
I used the DBeaver application to explore the database and understand the table structure1.
episodes
table- Contains all the episodes the app had ever fetched.
- Of those, the ones I'd completed had a
playbackDate > 0
. - If there is custom artwork per episode, the
thumbnail_id
column is populated. Otherwise it's-1
. - Has a
podcast_id
field that relates to thepodcasts
table.
podcasts
table- Contains all the podcasts the app had ever fetched.
- I can match the podcast ID to the podcast name.
- Thumbnail ID is populated for each podcast.
bitmaps
table- All of
thumbnail_id
values match with an entry in this table. - The URL to the image is stored here.
- All of
Creating the Query
Seems easy enough. I asked Copilot to make me a query, and it did. I could even iterate on it to get exactly the behaviour I wanted: if an episode didn't have custom artwork, return the podcast's artwork instead.
Prompts to achieve this
- In Python, connect to the SQLite database
/home/kaushik/podcast_collage/podcastAddict.db
. Add comments for any dependencies needed. Run a SELECT query on theepisodes
table and receive the outputs. - Create a SQL query to achieve this:
- Get the name, URL, playbackDate, duration_ms from
episodes
. - Join the
episodes
andpodcasts
tables usingpodcast_id
. - Join the
episodes
andbitmaps
tables using thethumbnail_id
field. - Filter by
playbackDate > 0
- Get the name, URL, playbackDate, duration_ms from
- This query is discarding rows where the join cannot be performed. I want all the rows. If there is no bitmap, that column should just be NULL
- I want to match either the thumbnail_id from table x or the thumbnail_id from table p and store b.url for that ID in a column. This left join should have 2 conditions. Either x.thumbnail_id matches b._id OR p.thumbnail_id matches b._id
Organizing the information
As a NamedTuple
I had Copilot then wrap this query in a Python script that runs the query and packages the results into a NamedTuple
. It wrote some neat code to create those records. I particularly liked doing all the data transformations in the from_row
class method.
I used the following prompts:
- Convert the playbackDate (which is a Unix epoch in milliseconds) into a human-readable date (discard the time). Convert the duration ms into a time object. Store the data in a NamedTuple for easier access
- Encapsulate the conversion functions in the constructor of the NamedTuple. You may want to use the class version
- Can you use typing.NamedTuple to clean this up
And it generated this code:
# Define a NamedTuple to store the data
class Episode(NamedTuple):
episode_name: str
podcast_name: str
url: str
date: datetime.date
duration: datetime.timedelta
img_url: str
@classmethod
def from_row(cls, row):
episode_name, podcast_name, url, playback_date, duration_ms, img_url = row
# Convert playback_date to a human-readable date
date = datetime.datetime.fromtimestamp(playback_date / 1000).date()
# Convert duration_ms to a time object
duration = datetime.timedelta(milliseconds=duration_ms)
return cls(
episode_name, podcast_name, url, date, duration, img_url
)
# Convert the rows into Episode objects
episodes = [Episode.from_row(row) for row in rows]
As a DataFrame
Unfortunately, it turned out that I didn't actually need a NamedTuple
.
Perhaps I shouldn't have been so prescriptive.
Since I wanted to group the data by date, I needed a Pandas DataFrame
instead.
These prompts did the trick:
- For every date in September 2024, I want the podcast names and episodes listened to.
- Would this be easier to do if the data was organized in a DataFrame first?
Filtering and Grouping
I also had to filter date ranges since the database had records starting from 2019. While I was at it, might as well have Copilot clean up the code a little bit:
- Create a function to perform filtering and grouping. Filtering should take a date range. The end date should default to the current date.
Traceback (most recent call last): File "/home/kaushik/podcast_collage/make_collage.py", line 68, in <module> september_2024_episodes = df[df["date"].dt.year == 2024][df["date"].dt.month == 9] ... AttributeError: Can only use .dt accessor with datetimelike values. Did you mean: 'at'?
UserWarning: Boolean Series key will be reindexed to match DataFrame index. september_2024_episodes = df[df["date"].dt.year == 2024][df["date"].dt.month == 9]
TypeError: Invalid comparison between dtype=datetime64[ns] and NoneType
Downloading the images
Easy enough to achieve with these prompts:
- For each of these episodes, download the image from the URL into a local directory with the date. Ignore ones that cause this error:
requests.exceptions.MissingSchema: Invalid URL '/storage/emulated/0/Android/data/com.bambuna.podcastaddict/files/podcast/Cautionary Tales/Tim_s_Tolkien_Obsession_Amazon_Prime_s_The_Rings_of_Power_24655.mp3': No scheme supplied.
- Remove entries from the group where the img_url column doesn't start with http
Making the collage
Finally, the big one. This took Copilot a little bit, but it got there in the end:
- Make a collage of these images per day
- Dynamically adjust the width and height for the collage. Say, if there are 4 images, they should be in a 2x2 grid. 2 images should be side by side, etc.
- Position is being calculated incorrectly. Nothing ever goes into the second row.
- This gets it wrong when num_images is a perfect square. 4 images should be in a 2x2 grid
- Redo this computation. A single image should be of size 100x100. Dynamically adjust the size of the canvas to arrange images of that size, based on the number of images in the group.
And voila! Here's my collages:
Takeaways
I can sense the trepidation programmers feel at the prospect of Copilot completely replacing human programming. But I don't think that's what's going to happen.
In this case, I got a sweet little thing made in a couple of hours that I never would've tried doing in the first place. Without Copilot, I would've had to:
- pore over SQL documentation to figure out how to structure my query
- understand how to connect to a database with SQLite
- parse all the data into the appropriate structures
- figure out how to use the
requests
library to download image data from a URL - find a library to put a collage together (I didn't think of
PIL
until Copilot suggested it) - align the images on a PIL canvas to create the collage.
None of those individual things is particularly challenging. I could do any of them by hand. But I would've needed to figure each piece out, one by one, and put them all together.
All things considered, I ended up building something. It's local, it's for myself, but it's real and it works 2. That, I think, is the promise of these AI tools: to empower people to build things.