Try something new. Everyday.

Sunday, 7 June 2015

'Love is all you need!': The Beatles Lyrics Analysis

Ever since I heard Abbey Road I had become a Beatles’ fan. The rock-and-roll music was good for any season, anytime. Listening to the Beatles was more of an emotionally engaging activity than a pastime. It was fun surfing all kinds of facts and trivia about the band from Liverpool so I planned to analyze their lyrics and gain some insight into their songs that most data available online didn’t provide.

Doing such an analysis gave me a scientific perspective of why the Beatles wrote what they wrote and how the audience could connect to that. Could there be something common with all the songs that clicked with the Beatles’ die-hard fans? This post tries to unravel possible reasons there could’ve been to the Beatles’ rise to fame.
Using the strength of data I plan to dissect the lyrical musings of the Beatles and try to find a deeper insight into their songs. Will a pattern emerge, perhaps a formula that could have guaranteed success to any artist? This post tries to answer such questions. And perhaps, the power of data could let you in on a secret or two on how to take the world by storm by becoming the planet’s next pop sensation.

Come together
I used some open-source tools for gathering the data and analyzing it. For the benefit of all Mr.Kites here’s the process that I went through to gather the insights:

  • BeautifulSoup: I used this library in Python to scrape lyrics off the internet. After visiting numerous lyrics sites I finally stuck with this one. With a several neat lines of code I could easily scrape the lyrics.
  •  nltk: This package was mainly used to tokenize the lyrics, find n-grams, lexical density, and other language based processing to find interesting patterns in their songs.
  • Sentiment Analysis: Found this awesomely simple API for sentiment analysis on Mashape.

I wrote all the code in Python. With all the tools ready and the lyrics scraped for analysis I was ready to take on the next step and start exploring all I could about the Beatles using just their lyrics.

Here comes the sun
The first thing I did was combining all the songs together and tokenizing their lyrics. The corpus had about 340 songs. The first thing that came to my mind was to have a look at how the word lengths were distributed in the lyrics. So using some nltk magic I was able to tokenize the lyrics.

There were a whooping 65781 tokens in all. To get a better grip on what was happening I calculated the lexical density. Lexical density is the ratio of unique tokens(read words) to the total number of words in a text. Let’s consider an example for better understanding. “Hello good-bye, hello good-bye” has 2 unique tokens (“hello” and “good-bye”) and a total of 4 tokens. The lexical density is 0.5 for that text.

The Beatles’ lyrics however had a much lower lexical density of about only 0.05056. That essentially means that only about 5% of the words in the lyrics are unique and the rest are repetitions of those words. At first glance it may seem a bit unsettling that but that could be one of the primary reasons of why their songs were what they were. Probably with vocabulary confined to only 3326 words their lyrics focused on a particular theme or wanted to convey something very specific.

It is to be noted that a speech has a lower lexical density than a written piece of work (Moby Dick has a lexical density of 12.29%). So one could comment about the style of the Beatles' lyrics. A trend of lower lexical density meant that their songs were written as if they were being informally spoken to someone unlike songs such as Bohemian Rhapsody with a really high lexical density of 0.422 even though it had more than 350 words.

Looking at the word lengths in their lyrics I found that the average length of a word was 3.77 letters. Again they seemed to be fascinated by using shorter words of lengths ranging from 1-5 letters that account for 85% of the total words.

fig. Word length distribution

At this time I was guessing that there would probably be a pattern of how they used these short words in different permutations to get the feelings conveyed. And boy that turned out to be the right hunch. Maybe finding more about the most frequently used phrases by the Beatles in their songs could be of some help. So, I extracted collocations in their lyrics of two and three word phrases (technically called bi-grams and tri-grams, and n-grams more generally). And the results turned out quite as expected.
Fig. Most common bi-grams

Phrases with the you-and-me-thing-going topped the list. With “you know” (336 occurrences) and “love you” (252 occurrences) followed by “I want” (241 occurrences) gave the other bi-grams a run for their money. This was conforming that the Beatles had some kind of general theme going on in their lyrics. The tri-grams had a similar story to tell. Phrases like “what I want” and “I love you” were the most common tri-gram collocations.

Fig. Most common tri-grams

The most common word that occurred in the lyrics was “you” with 3113 appearances, no surprises there. Other top contenders were “I” with 2375 appearances followed by “me”, “love” and “my” as popular contenders. What I figured from this was that the Beatles lyrics in general had the “you” and “I” feeling with sprinkles of “I wanna” anarchy here, there and everywhere.

The most common variations of “you” were “you know”, “if you”, “I love you”, “I want you”, “you can”, “love you love” and “with you”. You could write lyrics such as “you know I love you” and plan to spend weeks on the charts.

Taking the analysis one step further I ran sentiment analysis on individual songs. Songs with lower lexical density had a positive sentiment. This could be an effect of repetition of commonly occurring words in the lyrics. More than three quarters of the song had a positive sentiment. Out of the positive sentiment songs most of them were strongly positive. Also, there were almost no mildly negative songs. As from the plot most of them were more biased towards a strong negative sentiment rather than settling for a mild negative score. The lump of points on near the sentiment=1.0 makes a mark with 102 songs with a sentiment score of +0.9 or more.
Fig. Sentiment and lexical density of individual songs

In the end
The Beatles' lyrics had a more positive reflection as can be seen from the sentiment analysis. On a general note I can safely say that they wrote positive lyrics but it would be exciting to analyze the sentiment of albums as a timeline and when they were released to get a better insight into what was on their minds during the various phases leading up to their success.

Probably all albums had a positive note with some negative sentiment in the form of a song or two. It would be exciting to see whether their break up could be felt by analyzing the lyrics they wrote given the emotions in songs of 1969 album, Abbey Road, especially 'In the end'. Could the lyrics have somehow signaled of the inevitable break-up? Such a question would require more rigorous analysis on their lyrics and I leave that for a future post.

The Beatles kept it really simple with variations of short, simple words and a use of a confined vocabulary seemed to be the formula. Adding positive emotions by using phrases that displayed affection and conveyed the sense of belonging helped people connect to their songs. Indeed ‘Love is all you need!’

I plan to write a future post with a more detailed analysis based on the Beatles' discography in chronological order to get some more insight. So stay tuned and subscribe or get all the updates on the Facebook page of Gaussian Geek.

Feel free to comment on the post and share it if you liked it! Read how numbers have influenced humanity or take a look at the NBA Finals 2015 visualized using d3.js.


  1. Nice work. Is your code available on GitHub?

    1. Hey Dale,
      I haven't pushed a repo on GitHub as of now. I am working on analyzing lyrics according to the time they were written for a future post. But once I am done I will post the link to the GitHub profile on my blog.
      Thanks for the concern.

  2. The point is, your outcomes are worth nothing without a comparision to other corpora. Consider your notion that the lexical density of the lyrics is low: 0.05 isn't really low for of corpus that stucks on a special topic (love). I'm sure you would find such a value for really much bands. You comparision with the bohemian rapsody is poor, because here you calculate the lex. dens. of only 350 word, there you do this for 65000 words. Usually the lex. dens. declines if the tokens increase. That is, you cannot simply compare a whole corpus of songs with a single song.

    Furthermore, your outcome that the beatles prefer short words (3,77 letters) isn't a big finding. It is rather true for almost any text. The reason is, that function words as determiners, pronouns an so on account for nearly 50% of a text - and they where mostly short. Additionally common nouns, verbs and adverbs are often also short. This percentage shifts depending on genre, topic and so on.

    And yes, i would agree your notice that the lyrics of the beatles are rather spoken style than written. But show me pop music lyrics with elaborated language use.

  3. What was that 16-letter word?

    1. I think it was 'misunderstanding' in the song Strawberry Fields Forever.
      'Living is easy with eyes closed
      Misunderstanding all you see...'