CLIP Art in 2021

The Critic and the Creator

Does all art start as a goal in ones mind? Is this goal visual or sensory? What compels the mind to encourage the hand to paint? Do machine learning models share the same compulsion?

An Image Is Worth ~10 Words

Both images below took ~15 minutes to create using the VQ-GAN+CLIP (codebook sampling) method. Both use the Unreal Engine technique to render in a more lighting-sensitive format.

Both are quotes from two of my favorite books: The Richest Man in Babylon by George Classon and The Paper Menagerie by Ken Liu.

The book quotes I passed into CLIP:

“As for time, all men have it in abundance.” - The Richest Man in Babylon

That quote created this image:

and another…

“Nothing in the cry of cicadas suggests they are about to die.” - The Paper Menagerie

The Dance of the Models

As mentioned above, the work above is the result of two models VQ-GAN and CLIP. VQ-GAN handles the image creation process, while CLIP serves as the critic for VQ-GAN. Basically, for every picture VQ-GAN creates, CLIP looks at it against the text input and decides if it’s good or not.

The goal of the algorithm is to find a convergent point where CLIP is “satisfied” by the output VQ-GAN is creating.

The Dance of Human Art

The (my) creative process roughly follows the pipeline: ideation -> data collection -> drafting -> editing.

A person (or a group of people) have an idea. Research and resource collection is conducted (what tools are available? what is the cost of this idea? etc.)

A rough draft (approximation) is created.

Finally, the editor steps in and judges the actualized work against the idea.

From there, editing and drafting go back and forth in a cycle until an external validation metric is satisfied.

Sounds a lot like CLIP!