• Also known as
  • Search Results Clustering
Doc­u­ment Clus­ter­ing is a method for find­ing struc­ture within huge col­lec­tions of tex­tual infor­ma­tion whose con­tent and cat­e­gories is not pre­vi­ously known.

Doc­u­ment Clus­ter­ing uses clus­ter analy­sis to extract descrip­tive con­texts from all kind of texts. It can be applied to doc­u­ments, search results, web­sites, news, books, emails, and so on.

The extracted con­texts allow to auto­mat­i­cally group sim­i­lar doc­u­ments into mean­ing­ful top­ics, hier­ar­chies, or cat­e­gories (clus­ters). Clus­ters make it eas­ier for users and appli­ca­tions to man­age, sort and cat­e­go­rize the given infor­ma­tion.

For exam­ple, clus­ter­ing search results allows users to eas­ily iden­tify the searched infor­ma­tion by hav­ing addi­tional dis­tinct top­ics assigned to the result sets.