Giving Voice to AI

The making of "AI Unveiled: A Glimpse into Tomorrow"
Mike explains how a simple AI video project turns into a machine produced AI historical manifesto, and why working with AI isn't as easy as it seems.

Before we begin, I have a confession I need to make. When I initially pitched making a video using AI to the editorial team, I figured I’d gotten away with highway robbery. I had taken a few classes that insisted I do some storytelling using AI so I knew how to talk to GPT models to get what I wanted. This video would be the most straightforward project during my tenure at The Navigator;  a relative cakewalk compared to pouring through 15 hours of footage like I did in those cooking episodesnone of which featured cake in any capacity. No, this would be an easy late-semester project where I had a quick conversation with a GPT model or two, and it would spit out my finished project! Right? ….RIGHT!?!?!

Here is my original pitch: “I would love to move forward and continue on Jack’s piece about diffusion and how it has extended beyond images into video, voice, and streaming. I would ideally have the entire thing voiced through AI.” I thought it would be a home run, and I was excited when my colleagues approved it.

Little did I know that Megan and Tianna would have the last laugh. After several hours of trying to convince ChatGPT to give me a functional script, I started logging my hours. After 15 hours of training and research (nearly 1.5 credits worth of class time), I got into what I consider my “sunk costs”—namely, the non-skill-building parts of this project.

  • Three hours of attempting to obtain a script;
  • Ten hours of video “negotiation” (more on that later);
  • Two hours of sitting quietly in a corner, wondering where my life had gone wrong;
  • Another few hours of writing this… “memoir of a catastrophe” (let’s call it three hours).

My total time investment: 33 (ish) hours to figure out how screwed I may be when I graduate this summer.

A sample of the creepy footage generated with AI - watch those faces....
Scientists with unintentionally morphing faces.
Video Generated by: Runway ML

We exist in a world divided between awe, frustration, and fear of the rise of AI.

Negotiating with GPT Models

I use the term “negotiate” throughout this piece to discuss the act of prompting the GPT Models to produce the necessary materials to make a video project. These materials include script, footage, music, edits, and voiceover. Even though the generally used term is “prompting,” the sheer amount of back and forth was usually closer to negotiating with (and sometimes begging) the GPT Models to do what I wanted. 

The “State of the (AI) Union”

We exist in a world divided between awe, frustration, and fear of the rise of AI. Despite my best efforts to produce something else, it seems the ChatGPT overlord insisted that the final product be more akin to an AI Manifesto or an AI historical text than that I make anything else “meta.” Despite my best negotiations, if ChatGPT was going to discuss the state of AI, it would happen on its own terms.

Fine, I thought, audibly sighing as I started the tedious process of vetting the quotes that the model had includednone of which were accurate and, of the three, I could not find any mention of two of them on Google, Bing, or Duck Duck Go. The third was attributed to Dr. Max Tegmark of MIT but should have been attributed to Dr. Stephen Hawking. After spending considerable time chasing three quotes, I asked ChatGPT if it could provide me with links to the quotes. At that point, it informed me that it had knowingly misattributed the quotes because they were common knowledge (apparently, the “quotee” most likely said the things anyway?), and its data model was outdated. Inevitably, I had ChatGPT strip the script of the quotes and moved on to Invideo AI (to produce my video) and Runway ML (to make some generated shots).

Invideo and Runway both required negotiations of their own. In particular, Invideo required a lengthy voice recording to build a model of my voice and then insisted on re-writing the script based on its available video clips to produce the actual film. Runway, the only half-decent “text-to-video” generation model I could find that is publicly available, produced questionable results, especially with the human form. OpenAI has a new model called Sora that is extremely, even scarily, accurate. Still, it’s in closed testing and unavailable for this project. As a result, I was left with footage that sometimes looked like it came out of a Rob Zombie film.

ChatGPT tells me that it's actively misattributing quotes because they are "general sentiments."

[ChatGPT] informed me that it had knowingly misattributed the quotes because [it felt] they were common knowledge.
Screen Captured from: OpenAI | ChatGPT

But what about the ethics?

ETHICS BE DAMNED! (Just kidding). Interestingly, most ethical considerations pertain to the use of unlicensed training data. This training data is particularly concerning for artists who unknowingly have their footage or artwork used without consent. One perk of Invideo AI is that they only use licensed stock footageour filmmakers got paid! Admittedly, Runway ML is closer to a typical diffusion model. Unfortunately, they do not state where their training data came from, so it’s likely not ethically sourced.

These models used me for their own promotion in a way that allowed these AI models to tell their history and wax poetic about their future.

So? What’d you learn? Are we screwed?

I don’t think we’re screwed. Times are just changing. Some of the technology is quite impressiveespecially Sora. However, my “custom” voice model was monotone at best and struggled with words like “era,” so it still has a way to go. Will AI displace jobs? Yes, inevitably, but it will likely become another tool to increase production in its current iteration. We are in a situation today where these models are a giant technological leap forward. Still, we have seen many examples of these leaps displacing even highly educated individuals in the past. Humans are exceedingly adaptable, and our skills will always have value or use somewhere in ways that a machine just can’t. Our society will need to catch up. Ultimately, it is more important that we learn to use and work alongside these tools rather than fight their existence and usage. The industry doesn’t care that it can write your essay for you. They care that you don’t have to spend a half hour carefully crafting an email.

The weirdest thing about this project was having a platform use me as a platform. These models used me for their own promotion in a way that allowed these AI models to tell their history and wax poetic about their future. Even though I know that these models aren’t thinking, how they seemed to capitalize on my assistance instead of the other way around was creepy. Regardless, it was an interesting adventure.

Headhsot of Mike Duddy

Mike is a web designer, writer, and videographer who has lived and experienced stories from all over Western Canada. He is on his third journey through VIU pursuing his degree in Digital Media Studies. In his non-VIU life, Mike spends his time hanging out with his wife and kids, exploring Vancouver Island, and learning how to be a better global citizen.

Next Up…