Straight Outta Compton: Simple Lyric Analysis

It has been slightly over 27 years since NWA’s Straight Outta Compton was released on August 9th, 1988. The ground-breaking album contained a series of 13 tracks that spanned over 60 minutes of beats and (at the time) unthinkable lyrics. The six members at the time: Eazy-E, Dr. Dre, Ice Cube, MC Ren, DJ Yella, and Arabian Prince could only imagine the importance of their album.

Despite shedding a light on the realities of street life, based out of Compton, California, NWA’s lyrics were met with sharp criticism for their profanity, sexism, and violence. Straight Outta Compton received little to no airplay but managed to get triple-platinum status, define a genre of hard-core gangsta rap, and eventually spawn empires across multiple record labels, the movie industry, and even into Apple technology. The group and their activities would shine a light on crooked record executives, the gang violence dominated within the eventual East-Coast-West-Coast rivalries, and break out several future artists (Eminem, Bone Thugs, 50-Cent as prime examples) across multiple record labels

In this article, we take a look at the distribution of NWA’s Straight Outta Compton. As defined by AZ-Lyrics and Metrolyrics, the 13 track album contains a total of 9,523 words. That’s a total of 2.63 words per second on the album, and the lyrics that are most common are… well… the most common.

Most Common Words:

The most common lyrics in Straight Outta Compton are indeed the most common words in the English dictionary. The most common lyric is the word THE with a count of 458 instances; a total of 4.81% of the lyrics. Other lyrics that cover more than 1.00% of the album are:

  1. A (346 instances : 3.63%)
  2. AND (262 instances : 2.75%)
  3. YOU (236 instances : 2.48%)
  4. (229 instances : 2.40%)
  5. TO (225 instances : 2.36%)
  6. IT (145 instances : 1.52%)
  7. I’M (141 instances : 1.48%)
  8. IN (129 instances : 1.35%)
  9. IS (100 instances : 1.05%)
  10. BUT (94 instances : 0.99%)

Majority of the top 50 lyrics are common words ranging from OF to GO to NOW. Nothing exceptionally exciting in this department. So let’s look at more exciting topics.

Distribution of Profanity

Warning: This section will have profanity in it. 

The standard common words for profanity are Fuck, Shit, Damn, Bitch, and Ass. Profanity is definitely apparent in the lyrics of Straight Outta Compton. Here, we find the distribution of profanity.

The word FUCK appears a total of 86 times for the 13th most common lyric in the album. This is a total of 0.90% of all lyrics. However, there are other variants of the word. In this case, there are a total of 166 instances of FUCK in all its variations, which makes this phrase the 7th most common lyric on the album at 1.74%; at a rate of once every 21.85 seconds. Compare this to the rate in the movie Goodfellas (once every 29.27 seconds), The Big Lebowski (once every 27.03 seconds), and The Wolf of Wall Street (once every 18.98 seconds), and we find that the album is no different than standard movies. However, by 1988, only Eddie Murphy’s Raw (once every 24.29 seconds) is the only movie that is considered in competition with Straight Outta Compton’s profanity. There is small wonder as to why the lyrics surprised many listeners on the first pass.

So how do all the other profane words stack up? SHIT appears 51 times at a rate of once every 71.11 seconds. BITCH appears only 24 times; or once every three minutes 31 seconds. DAMN appears 11 times and ASS appears 53 times.

In total, there are 305 profane words from the main five profane words. This is a total of 3.20% of  the total lyrics in Straight Outta Compton; a rate of once every 8.61 seconds.

Distribution of Sexism

Warning: This section will have profanity in it. 

As Straight Outta Compton is criticized for its sexism in its lyrics, we also take a look at the distribution of sexist words. These are words such as HO and BITCH. However, we will not be including the word MAD, as MC Ren indicates it is something that is suppose to happen. We will also include homophobic words in the count.

To this end, the most frequent sexist word in Straight Outta Compton is HO with a total of 173 instances; or a rate of once every 20.97 seconds. The word is used in several variations, making it not top the most common words list; however with all variations it lands at 7th overall; pushing the top swear word down to 8th overall.

In terms of homophobic words, the word FAG and all of its variants only appear once in the entire album. This is a mildly surprising artifact of the album. In total, there are 198 sexist words on the album, leading to a grand total of 479 profane words (swearing and sexism). This makes up 5.02% of the album; a relatively modest effect when considering the state of modern media.

Compton Love

So how often is Compton represented in Straight Outta Compton? In total, COMPTON is referenced only a mere 29 times. The city of LOS ANGELES is reported a total of 0 times. COMPTON is stated in only 6 of the 13 tracks, most notably not making any appearances in Express Yourself, Fuck the Police, Gangsta Gangsta, and I Ain’t The One.

We Want Eazy

How often is each member of NWA referred to? We can check that as well. If you guessed that ARABIAN PRINCE obtained the most references, then you’d be wrong. That honor belongs to the Ruthless Villain, MC REN. In total, MC REN is referenced 44 times in Straight Outta Compton. Despite not being viewed as the most prominent member during the run of NWA; nor one of the two prominent members of the rap industry after NWA, MC REN manages to make his mark in songs If It Ain’t Ruff and Compton Is In The House.

The second most frequent NWA member call out is to EAZY-E with 21 instances. ICE CUBE follows in third with 17 references as DR DRE comes in with 14 references. The final long-standing member of NWA, DJ YELLA, came in with 9 references; while ARABIAN PRINCE managed to obtain three references in the entire album. Fortunately, despite having one main track on the album (Something 2 Dance 2), Arabian Prince’s three references are stretched across three separate tracks on Straight Outta Compton.

Top 10 Uncommon Nouns

An interesting note on Straight Outta Compton is the list of the most frequent uncommon nouns. The two most common uncommon nouns are, someone hilariously, ASS and SHIT with their 53 and 51 instances, respectively. REN comes in third is 44 instances, and POLICE clocks in fourth with 31 instances. COMPTON comes in fifth with 29 instances while MONEY and GANGSTA both ring in 6th with 23 instances.  The final three uncommon nouns that are most frequently used are EAZY-E (21 times), ICE CUBE (17 times), and HOUSE (16 times).

In total, the uncommon nouns make sense due to the songs Straight Outta Compton, Fuck the Police, and Gangsta Gangsta.

Ordering of Lyrics

WARNING: There’s math coming up…

For an NWA track, we can build a matrix of transitions trained on Straight Outta Compton. That is, the rows of the transition matrix are the first words in two-word pairings; while the columns are the word that follows in the two-word pairing. Since there are a total of 1,719 unique words in Straight Outta Compton, the matrix will be 1,720 by 1,720 matrix. The rows are indexed by the 1,719 unique words and the beginning of a sentence. The columns are indexed by the 1,719 unique words and the end of a sentence.

As a simple example, consider the following lyrics:

Here’s a lil gangsta, short in size. A t-shirt and Levi’s is his only disguise. Built like a tank, yet hard to hit. Ice Cube and Eazy E cold runnin shit.

Here, there are 28 unique words, with four starting sentences and four terminating sentences. The resulting matrix is 29 by 29. The entries of the matrix are populated using by the empirical probabilities of transitions.

The row indexed by AND will have two columns populated with 0.5 at columns indexed by LEVI’SEAZY. The column ENDSENTENCE is indexed by four different rows with 0.25at indices SIZEDISGUISEHIT, and SHIT.

We then start with the BEGINSENTENCE mark and draw a random number between 0 and 1. This will place us in a column bin that indicates a number. That column is the next word. We then take the row of that word and repeat the process until satisfied. Let’s use this basic template to draw a random three line lyric.


There’s not much room for change in lyrics. So let’s expand our training set over all Straight Outta Compton lyrics.


It doesn’t seem to make sense. To generate NWA lyrics at random, we may consider a slightly more sophisticated model than a single word-to-word Markov Chain Monte Carlo transition. Instead, we may wish to build a bag of topic models and present phrases.

However, these small building blocks help form a method for comparing themes and lyrics on a mathematical level. For instance, what other artists are similar to the lyricism of NWA? How do we quantify this?

These are complicated answers that can use mathematical analysis to help uncover some similarities that are not directly apparent to the listener. That said, mathematics will not replace the more human qualities such as resonance or depth.

This type of analysis is also known as n-gram analysis and is common in natural language processing. This type of breakdown of the language helps us identify mathematical qualities of sentences and phrases. Therefore, we are indeed able to compare lyrics and identify how likely one artist’s work is derivative of another; or whether an artist’s lyrics are truly inspired by other artists. This is an interesting subject; one which we will not dive into. However, the goal here was to show some basic natural language techniques in word counting and analysis.

If you’re interested in this technique, feel free to leave a comment and request the Python code used to trawl the NWA lyrics.


