Understanding Attention In Transformers Models

I thought that it would be cool to build a language translator. At first, I thought that I would do so utilizing a recurrent neural network (RNN), or an LSTM. But as I did my research I started to…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Language modelling to person modelling?

With the pandemic raging through our lives, we struggle to define our “new-normal” well beings. Restricted social gatherings and meetings with loved ones have left some of us longing for companions to share our thoughts with, vent out or simply live in the moment.

Given that artificial intelligence is transforming into the omnipresent force that is solving a lot of problems around us, can this be harnessed to give us some sense of belongingness?

With this intention in mind, I set out to create an intelligent bot that is able to respond to me, like a person would. In short, the Jarvis to my Iron Man.

After cleaning the text by removing stop words, punctuation and tokenizing it, let’s take a look at the top 5 most common bigrams used by the two speakers.

While we did hear a lot of New York talk from Trump, Biden’s style of enumerating his points is evident too.

So far so good!

I set out to design my person model trying to teach it to predict the next word given a sequence of words.

LSTM model for language modelling

Some bold decisions.

Word2vec for word embeddings

Maximum sequence length

The average length of the sentences used by the two speakers was around 10.67 and 14.02. However, I decided to cap this at 10 since the goal of the model was to make sensible sentences that are not necessarily long.

Cosine similarity for loss

For the same reason that citizens and people are synonyms, I thought it was unfair to penalize the model for using one in place of the other. Since the vector representations for these words have high similarity, cosine similarity seemed like a reasonable choice for the loss function.

As the heading spoiled it, the model did not do well — both in terms of accuracy (around 50%) nor in terms of predictions. This can probably be attributed to the large vocabulary set of the model (which is that of the Word2vec pretrained model) relative to the input data. Just a guesstimate!

On using a smaller sequence length of 4, and also eliminating word2vec, this model seemed to perform much better. The text in blue is the input to the model (snippets that appear in the text). The output of is highlighted in orange, if it is present in the text and green if it does not.

Sample output

While the output is not very impressive yet, it did manage to learn that white is generally followed by house , tax followed by cuts , law followed by enforcement etc…

This is indeed the tip of the iceberg. The model needs to be trained and evaluated on larger datasets but that’s for another time. Until then, happy coding in a parallel world!

Add a comment

Related posts:

Yubo unlimited unlocks generator 2021 Android iOS

Unlimited unlocks generator for Yubo. Fully compatible with all Android and iOS devices. Doesn’t require root or jailbreak. Easy to use and completely undetectable. Yubo is an online social community…

The K stands for Kingdom

This phrase is widely agreed upon by Korean popular music fans, but when looking closer has a layered meaning. The positive images that the popular culture form displays for fans stands in stark…

Creating Your Best Future Without Defining It By Your Past

a sentimental longing or wistful affection for the past, typically for a period or place with happy personal associations. Nostalgia is a curious pair of glasses. It often over estimates our previous…