Latent Semantic Indexing (LSI) – What Does It Mean For You?
What exactly is Latent Semantic Indexing? Given that there are some 10 billion web pages on the internet, programmers had to figure out a way of helping search engine spiders determine a page’s subject matter.
Latent Semantic Indexing is simply an indexing and retrieval algorithm which is used to discern relationships within a collection of text. It is mathematically based and assumes that words have similar meanings when used in the same text. It is these relationships that helps search engine spiders discern what a web page is about.
What if I gave you a document to read very quickly? Let’s say 10 pages in 60 seconds. How would you do it?
You would probably scan each page very quickly looking for a common theme, common terms/phrases or maybe a common concept. Based upon how frequently any one item appears, you would be able to reasonable assume what the document related to.
Fortunately humans don't decipher the billions of web pages on the internet. We do however delegate that task to the search engine spiders (bots). Latent Semantic indexing is an attempt at helping the bots understand each page.
It helps the bots formulate assumptions about the both the keywords on a page and the content.
In addition to recording which keywords a document contains, the method examines the document’s collection as a whole, for a comparisons to other documents containing some of those same words.
Any two web pages that have several keywords or phrases in common are considered semantically close. Those that do not are not semantically close.
Let’s say a web page is submitted for indexing, Latent Semantic Indexing indexing that is. The algorithm looks for similarities in every word/phrase on the web page. The comparison is between each word in the page and the same word within the pre-indexed database. As you can imagine, over time the database grows and the bots become reasonably intelligent. Based upon the comparison, the bots can be reasonably expected to determine a page’s subject matter. using Latent Semantic Indexing.
What would a web page look like to a search engine spider?
First, strip your page of all extraneous words. Take your document as a whole and remove all pronouns, prepositions adjectives conjunctions and verbs. The end result should be a document containing only words with a semantic comparison value.
What does all of this mean?
Simply put, reverse engineering how bots view your web pages should help you write a more search engine friendly article. For example, if you were to create a web page based upon Baseball, what words or phrases would you expect to see?
In addition to the word baseball you would also expect to see words such as glove, base, bat, umpire, pitcher, catcher etc. A search engine spider finding these words within a page would be expected to determine that your page was about a baseball related topic.
Remember, determining the relevancy of your page is half the battle when it comes to accurately ranking it.
When writing content it is obvious that your first priority is your
readers, however you can also help the bots along too. An
understanding of Latent Semantic Indexing may lead to the way.






