TF-IDF Calculator Implemented in Java

In this tutorial i will show you how to implement the TF-IDF Algorithm in Java, TF-IDF stands for Term Frequency-Inverse Document Frequency, this algorithm is highly used in Text Mining to convert text inputs into a vector that contains weight of each term in each document.

**Definition**

* TF(t,d) = Term Frequency(t,d):* is the number of times that term

*t*occurs in document

*d.*

** IDF(t,D) = Inverse Term Frequency(t,D): **measures the importance of term

*t*in all documents

*(D)*, we obtain this measure by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient.

$latex IDF(t,D) = log_2{frac{N}{DF}}$

where *N* is the number of documents in the collection, and *DF* is the number of documents in which the term *t *appears.

Finally, the weight is obtained by multiplying the two measures:

*TF-IDF(t,d) = TF(t,d) * IDF(t,D) = Term Frequency(t,d) * Inverse…*

View original post 58 more words