An AI that can make you smile or cry

If you’re in a quiet place or can put on headphones, I encourage you to press play on the track below. It’s only 32 seconds long. It was written by an AI.

An AI developed by my friend’s company, Jukedeck, was asked to write something “emotive” and “cinematic.” This is what it came up with.

In this piece, I will discuss the intersection of AI and creative expression. If you’re someone who likes to listen to music while reading, the below piece from Jukedeck’s AI is designed to accompany this article (which should be about a 5 minute read from this point forward).

Like essentially every VC, I’ve been thinking a lot about AI. It will impact (in many cases, is already impacting) virtually every industry. While I’ve spent plenty of time thinking about many of the traditional AI-related questions — future of labor, rogue ASI, simulation paradox, etc. — one quandary that is less widely discussed and more near-term is what happens when AI becomes competent at creative expression.

We traditionally think of visual arts, music, and storytelling as uniquely human pastimes. Instinctively, we conceptualize technology as best suited for automating routine processes, but relegate creative expression — so utterly non-routine and unpatterned — to the sole domain of human pursuit.

An AI that successfully generates rich, emotionally expressive content would force us to reflect more deeply, or at least more viscerally, on the nature of human consciousness — certainly more so than an AI that simply automates filing taxes or driving a car. To some extent, automating creative expression is a narrow application of the famous Turing Test (a test, developed by Alan Turing in 1950, of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human).

Here’s my bet for the first AI generated content that will make you smile or cry: music.

(Probably composed by an AI developed by my friend’s company, Jukedeck.)

Where we are Today:

Lots of relevant application-layer work shows great promise in commercializing AI in key content areas like computer vision and natural language processing. The technologies that undergird these applications — algorithm generation, modeling, selection, optimization, and hardware implementation — are improving at an accelerating rate.

A few intriguing examples:

Original image (top). A Deep Dream AI trained to recognize buildings and landscapes output (bottom).
This dreamscape is what this same Deep Dream AI saw in an image of random white noise (white noise in upper left).
  • Some recent work by a team of fellow Cornellians explores visually what can be done in the category of AI style transfer (AI style transfer is where an AI analyzes a given item and determines its unique characteristics [its “style”]; the AI then modifies subsequent items to match the given style). AI style transfer is an area I’m following closely.
In the photos above, a primary image (far left) and a style image (center) are given to an AI. The AI’s job is to modify the primary image so that it fits the characteristics of the style image. The output is displayed on the far right. This type of process isn’t solely relegated to images.
This video visually illustrates a genetic algorithm optimization technique. This simulation utilizes a basic physics engine and a simple simulated 2D topology to evolve vehicle designs. It first assembles components (shapes like circles and squares) at random and tests the randomly spawned vehicle designs. It then utilizes the performance (distance traveled) of these simulated vehicles as the basis for selecting vehicle designs to be passed on to the next generation (plus a degree of random mutation). Through this process, the algorithm automatically evolves vehicles and optimizes their performance.

So, why do I think music will be the first AI generated content that will make you smile or cry?

Intuitively, it’s because music strikes the perfect balance of emotionally rich expression yet is sufficiently grounded in pattern and math for AI to be viable.

To be clear, I think AI generated written content and visual content will commercialize earlier and more easily than audio content, but it will likely be commercialized in a more, well, “commercial” context (marketing content on the NLP side and AI-powered editing tools embedded in photo and video editing suites, e.g., Adobe, on the visual side).

For AI-composed music to reach human-level quality, I believe it’ll need to make use of two key advantages:

  1. Real-time Responsiveness: One key advantage machine generated content has relative to human content is on-the-fly adaptability. Non-static content will fundamentally alter content consumption. As the man-machine interface improves, the inputs that drive the responsiveness will become more robust, less latent, and more interesting. Imagine game music that adapts on the fly to what the player is doing and seeing. Imagine music that adapts in real-time to the individual preferences of a given listener. Looking ahead, VR / AR are particularly interesting in that a lot of new user input methods are being invented or incorporated inside of them: motion & gesture tracking, eye tracking, facial expression tracking, brain wave function, etc. This all increases the user’s usable cognitive output, with rapid sampling rates and low latency. Music software today is capable of taking simple inputs — for example, pressing a single key on a piano — and generating dynamic backup music that adapts to the input. That said, this key-press input method is crude; it is the UX equivalent of a command line interface. With all of the amazing new human input devices that are being invented (particularly inside VR), I can’t wait to see the GUI equivalent. Because AI-generated music (and other content) can respond dynamically to all kinds of human inputs in real time, it can create truly unparalleled, unique new forms of content to experience.

While this discussion strictly bifurcates human composition from AI composition, one item I didn’t include is the significant room that will exist for human + AI collaboration in content creation. AI tools may become an integral part of the composition process and will likely be a staple within music creation software. For example, DJ’s may be able to press an “auto mix” button to blend two songs together or smoothly transition between them. Or, imagine a DJ who knows they want to play Song A and then Song C, but is looking for the perfect Song B to bridge the two. Or, imagine using AI style transfer (like in the images above) to take a certain song and automatically generate a remix in your favorite genre. There are plenty of synergistic ways for AI to collaborate with and enhance human creative output! (Someone even used AI to design a sauce to go with their favorite dish.)

Conclusions:

For all of human history, cognition has faced an I/O problem: input bandwidth is extremely high (our five senses ingest information at an incredibly high rate), but output bandwidth is constrained: typing sucks (it’s extremely slow and inefficient); I’d argue that speaking and singing and dancing and body language are among the most high-bandwidth outputs, but they’re still nothing compared to our ability to intake, process, and synthesize information via our five senses. But output — expression — remains the critical bottleneck.

We could see technology seriously shift that dynamic during our lifetimes. Computing, broadly speaking, has already started this process (computers famously as the “bicycle of the mind”), but the curve is just now beginning to accelerate as user output methods become more seamless. Adaptive AI content generation coupled with new forms of human output capture (e.g., voice, gesture, eye tracking, facial expression tracking, brainwave mapping) could dramatically magnify our ability to express and to create, and foundationally alter the nature of the content we produce and consume.

If AI is progressing this rapidly across a multitude of domains — increasingly able to generate emotionally rich, meaningful content, to synthesize stunning visuals, and even to elucidate the very narratives and story arcs that undergird human engagement — how long before AI is able to simulate a reality that is just as meaningful as the one we currently inhabit? Assuming we’re not in a simulation (or even if we are), how might the AI revolution augment this reality?

All of this is going to take a while. A long while, most likely. With many ups and downs. But it will be absolutely fascinating to watch (and listen)!

Bonus: Here are some other great tracks by Jukedeck’s AI!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrew Schoen

Venture Capital Investor at NEA (New Enterprise Associates). Co-Founder of Flicstart. Schwarzman Scholar.