GENERATIVE AI Toolkit

Until recently, my creative output revolved around creative writing, travel writing, and photography. With the recent advancements in generative AI software, I’ve applied my background in writing to develop my skills in prompt engineering across a variety of mediums, including images, videos, script-writing, voiceovers, and music production. Together, this suite of tools has unlocked a synergistic workflow for storytelling, with the combined effect of each bringing life to a world-building process more powerful than the sum of the parts alone.

CHAT-GPT

Text → Text — A meta-tool for building the foundation of other generative AI projects, using bringing ideas via text to develop into ideation, brainstorming, outlining, feedback, scriptwriting, project planning, prompt-writing, and more.

Imagine → Text — Using computer vision capabilities of GPT-4 Turbo to upload photos for analysis.

Speech → Speech chat — Using the voice to voice chat function to talk through ideas out loud, as speaking and listening can unlock creativity in a way that typing and reading does not. There is a more fluid, conversational nature to this mode which allows thoughts to be delivered through a different state of mind than typing alone allows.

ELEVEN LABS

Text → Speech — Allows for efficient workflow where written or copied & pasted text can be quickly converted to speech.

Speech → Speech synthesis — Allows for mapping synthesized voices to manually recorded clips, without outputs that maintain the nuance of the original speech clips, including pace, tone and emphasis on certain words and syllables.

Custom Voice Creation — Synthesizing various speech clips to create unique, custom-tailored voices.

MIDJOURNEY

Text → Image — Generating images using detailed & structured prompts with attention to prompt structure, subject matter, character consistency, camera angles, aesthetic styles, and aspect ratios.

Image → Image — Using style reference (—sref) photos or character reference (—cref) photos to create emulated styles and characters.

Over 2,000 total images generated

Kive.Ai

Kive is a visual asset management system that uses Machine Learning to allow search for images based on subject matter, color (including specific hex codes), and art styles (photography, illustrated, collage, anime, etc).

Used in conjunction with Midjourney, it created a workflow that allows for specific styles to be identified and applied to generating images, skipping the process of trial and error for achieving a desired aesthetic style.

RUnWAY.ML

Text → Video - Generating short video clips that can stand alone or be stitched together into longer segments.

Video → Video - Transforming videos using AI filters to replicate the general movement of the original clip masked with various aesthetics.

Udio

Text → Music — Generating musical tracks using detailed & structured prompts with attention to automated vs manual prompting, musical genres and sub-genres, compatibility of mixed genres, vocal melodies, and automated vs custom lyrics.

LALAL.AI

Music → Component Stems — This tool splits a song into its constituent stems (eg. vocals, drums, guitar, etc) - a useful process for then mixing, remixing, and otherwise altering a song into something new in music production software. I use it in combination with Udio and Logic Pro.