Scraping Images from Reddit /r/anime
Following on from my previous article about scraping images from Reddit, I decided to expand the concept out to a more fully featured tool.
I think Reddit's /r/anime is one of the most entertaining and best subreddits. Something that drives this is they have a bot that creates discussion threads for new episodes as they go out. They generate a lot of discussion and screenshots for shows.
I noticed since the threads are generated by a bot, its easy to search and get a list of all of the discussion threads for a show.
I decided to have a go to see how easy it was to implement an episode search.
Search for Episodes
The PRAW library makes it trivial to search Reddit.
subreddit = reddit.subreddit('anime')
matches = subreddit.search('violet evergarden', limit=250)
From here we can RegEx someones search string into a query to find all of the threads that match the ShowName - Episode discussion
format.
REGEX_STR = '.*' + filteredName.replace(' ','.+') + '.*Episode\\D+(\\d+).+discussion.*'
This combined with the image search from the last article machines for a nice little scraper based on show name.
Frontend
The Frontend uses a simple CSS only framework Bulma for the UI. I'm really impressed with how easy it is to get an image window up for the preview feature of the app.
And the loading bars are nice too
This with a bit of JQuery to let us search the API and populate the results. It's a bit hacky so I've not gone into much detail on it, it'll be doing more work with it for the next article. The API is all served up through Azure Functions.
You can give it a try in the window below. Or the app is hosted here.
Next Steps
- It'd be nice to allow scrolling through the images while in fullscreen mode through arrow keys or buttons each side of the image.
- These lookups are slow. There's an opportunity to implement a caching solution that looks at the number of comments or another flag to let us re-use lookups.