[AI Seminar] Will it be able to write, compose and sing songs like human?

Jul 5, 2021

The topic of the third seminar was whether AI would be able to write, compose and sing songs.

Do you remember the cyber singer Adam? Adam, who came on the scene in 1998, was a virtual character with a voice-over by an actual human singer. The technology for machine-based virtual singers has been a constant subject of research.

After cyber singers, the artificial intelligence vocal technology emerged. Once given a music score, AI vocal is capable of singing using the voice synthesis system that creates the singing voice of a virtual singer. It can sing based on such data as pitches and lengths assigned to letters that make up the lyrics.

For a machine to be able to sing, it needs the text-to-speech (TTS) technology. TTS is a technology that artificially creates human voice by converting text into voice. TTS is already being utilized in various fields. AI speakers, which nowadays can be easily seen in many homes, and information robots are equipped with this technology for their human-like voices.


Products using TTS Technology


Many universities and companies are studying the singing voice synthesis technology based on TTS that converts text into voice. Below are the most representative of such endeavors.



Since 2003, the musical instruments manufacturer Yamaha has been developing the voice synthesis software named VOCALOID. The following video shows a robot singing by using VOCALOID (2007). Pretty cool, huh?

The latest version is VOCALOID5 which was released in 2018. This version offers Standard and Premium options.



Yamaha also sells the Voice Library to be used on VOCALOID. Each library contains different characters with unique features and supports different languages. Among these characters, Hatsune Miku is the most famous one. Miku has an impressive resume, having hosted a concert and even appeared on Late Show with David Letterman and Lady Gaga’s concert.




Supertone is a Korean startup that was founded in 2020 and is currently developing a voice synthesis solution capable of singing and acting. Have you seen the recent SBS TV program “Battle of the Century: AI vs Human”? On this program, AI sung one of the late Korean singer Kim Gwang-suk's songs and also mimicked the female singer Ok Ju-hyun’s singing.


Besides, another Korean startup named Humelo has developed an AI voice actor service and is currently working on AI Music, an AI music composer/lyricist. With K-pop making waves all over the globe, seeing these Korean companies making progress in such areas is very exciting.


Would it really be possible to make AI write songs?

To write a song, one usually needs to create a score with the melodies and rhythms and also write lyrics. Let’s talk about creating lyrics, first.

The AI technology is already capable of generating text based on language models. Using the same method, it can also ‘generate’ lyrics. However, what is also important is the consideration for musical elements such as rhythms and repeating structure. There is a service called Keyword to Lyrics which creates lyrics for you when you enter certain keywords.

Then, what about composing a song with melodies and rhythms?



Jukebox is a music composition technology developed by OpenAI which uses raw audio data in learning to create high-quality sounds and various musical pieces. All you have to do is to enter raw audio data and set the options (genre, artist, etc.) to finish a song of your style. But it takes nearly 9-12 hours to make a 1-minute song and the output data tends to contain much noise, which are limitations to be overcome.


AIVA is a commercialized technology that was used to compose the keynote piece for GTC2021 held by NVIDIA. AIVA is known to use reinforcement learning and is a web-based composition tool. It mainly uses classical music as the basis for composition, so most of its music have a classical feel.



Amper is another commercialized composition tool that is web-based. It can create music based not only on the given genre but also on various moods, styles and tempos. The focus of this technology is on creating an entire song.


So far, we have taken a peek at the AI-based singing and composition technologies. Imagine what it would be like, once AI becomes widespread, to be able to pick a song and then listen to it in the voice of your favorite singer. Would the AI composition technology someday allow us to easily make music to our liking and listen to it anywhere and anytime?

The AI singing and composition technologies certainly have limitations now, but these could be overcome through constant research and development. Let’s hope together that they will make our culture richer and more enjoyable.