| ||||||||||||
|
Since the beginning of Gridcosm, articipants have dabbled with the idea of Grid Poem Readings. Evidence of this (including some awesome samples) can be found in the Gridcosm Readings page, which took place around 1999 and 2000.
Since everything these days wants to use AI, why not apply this to grid poem readings as well? Toward that end, we did a little experimentation with some open source text-to-speech tools and got some really interesting results. Or, maybe, mildly disturbing? But this is par for the course with Gridcosm, so the work continued.
Generation of the levels is time- and GPU-consuming, so more will be added in time. Eventually these will be accessible from the corresponding Gridcosm levels. In the meantime, we are also experimenting with a few episodes of a grid poem reading "podcast". Can a podcast really be a bunch of nearly 30-year-old psychadelic art-prose read by machine learning magic software? Why not!
Currently we are going to try out presenting these podcasts via a self-hosted platform called Castopod. This setup may change later, so pardon our dust.
The format of each episode is a bundling of ML-generated readings in sets of 25 levels, including some fancy intro and outro stuff and even background music! That makes it a podcast, right?
The text-to-speech software can be a little glitchy and introduce some strange artifacts. If you read along with the actual levels' poems, make note of where the robots are trying to ad lib. Sneaky robots! Castopod should allow you to follow the podcast with your favorite RSS-friendly podcast app. And, if you are into the whole fediverse scene, you can even follow/comment/like from Mastodon or any other ActivityPub-compliant thingee.
If you are asking yourself, "How can this be possible?", read on. If not, avert your eyes.
SITO has long utilized and championed Open Source software. And once again, it has come to the rescue. The conversion of grid poem text to speech samples is done by Tortoise TTS, which gets shockingly good results in a reasonable amount of time and processing. The rest of the process is comprised of a bunch of sketchy shell and perl scripts that parse up Gridcosm data and cobble together the level audio files, sounds files, and other pieces into a final mp3, with the help of FFmpeg.
Results can vary, and further experimentation is ongoing. Here are a list of observations and miscellania that you might find entertaining.
voice=random
option is used in Tortoise TTS. This results in all kinds of chaos. It seems some "single" voices even alternate between speakers in a single
segment of text. In general, expect a rollercoaster of vocal thrills.