PerleyML Bag Of Words

Matt Perley
2 min readMay 22, 2021

--

Bag of words(BOW) is used frequently in machine learning. It can be used to convert image or text data, for instance, into numeric data a neural net or algorithm can understand. Currently in PerleyML there is an ability to convert strings into their numeric representation so we can then cluster them and classify.

BOW usage. PerleyML

As you can see above creating a bag of words is extremely easy and straight forward. User passes in characters they want to filter out and the class takes care of the rest. It outputs a dictionary containing entries with the word and number of times in the dataset it is used. From there the data set can be broken down into a jagged int[] (integer array).

Example with clustering.

Above you can see the data from the Bag of Words being used by a KMeans and KNN class. In this example I am adding up each element which is then added to a list called wordCounts, then just for example, we divide the nval(wordCount) by the length of the number of possible words. This gives us our x and y values. They can then be passed into the cluster algorithms easily.

There is really not much more to the bag of words. As stated before, it only takes in string data currently, but there are plans to change that in the future.

*This article will be updated.*

--

--