We have developed a method to extract keywords from Polish language texts.
Development of a suitable algorithm was not an easy task, not least because of the fact that in the Polish language, long strings of nouns, adjectives and symbols appear quite frequently.
The proprietary solution is called the Polish KeyWord Extractor. Extraction of key phrases from the article begins with pre-processing of the text. At that stage, the text should be divided into sentences and words. For each of the words, certain characteristics are defined, such as: number (form), part of speech, gender, etc. After that is done, identification of potential candidates for key words takes place, and at the next stage – their assessment and presentation of the final, preset number of keywords.
More information on this issue can be found in the second volume of the publication edited by PhD Jaroslaw ProtasiewiczProcedures of review and reviewer selection.
Project Title: Polish KeyWord Extractor