IMPROVED METHOD OF EXTRACTION OF KEYWORDS IN THE WEB-TEXT
DOI:
https://doi.org/10.31649/1999-9941-2018-43-3-43-47Keywords:
Natural Language Processing, Text Mining, Keywords Extraction, withdrawal of terms, keyword extraction, natural language processing, computer linguisticsAbstract
The paper proposes an improvement of the method of extracting key words and phrases in the web-text. The following main stages of the formation of a plurality of key words and phrases are considered in order to find ways to increase the speed of indexing and refereeing web texts, to accurate source text, exclude stop words, cut off bases and endings from the text, the formation of key words and phrases from the source text. The proposed improvement is based on the use of the vocabulary of the subject area compiled by the expert. The dictionary is formed taking into account the frequency of repetitions of keywords and phrases in the web-text, will improve their relevancy. The comparison of the quality of the revealing keywords and phrases in the Ukrainian and English language web texts with the systems Expert Review, Open-Calais, Extractor, as well as the system based on the proposed method using the dictionary, recall, accuracy and F-measure. The analysis showed that the proposed advanced method for extracting keywords and phrases in Ukrainian and English web-texts will allow to reveal relevant words and word-received with an increase of their F-measures by 9.5%, and completeness and accuracy by 15%.
References
Bracewell, D. B., Ren F. Multilingual Single Document Keyword Extraction for Information Retrieval. Proceedings of NLP-KE, 2005, pp. 517-522.
Большакова Є. І., Клишінскій Е. С., Ланде Д. В., Носков А. А., Пєскова О. В., Ягунова Є. В. Авто-матична обробка текстів на природній мові і комп'ютерна лінгвістика: навч. посібник. М .: МІЕМ, 2011. 272 с.
Hasan K. Automatic Keyphrase Extraction: A Survey of the State of the Art / K. Hasan, V. Ng // Pro-ceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. – 2011. – Vol 1. – pp. 1262-1273.
Dictionary Based Annotation at Scale with Spark, SolrTextTagger and OpenNLP [Електронний ре-сурс] / Sujit Pal // Spark Summit 2015. Europe. - URL: https://spark-summit.org/eu-2015/events/dictionary-based-annotation-at-scale-with-spark-solrtexttagger-and-opennlp.
Плющ М.Я Граматика української мови: У 2-ч. Ч. 1. Морфеміка. Словотвір. Морфологія: Підруч-ник. — К.: Вища шк., 2005. — 286 с
Dostal M. Automatic Keyphrase Extraction Based on NLP and Statistical Methods. Proceedings of the Dateso 2011: Annual International Workshop on Databases, Texts, Specifications and Objects. Pisek, Czech Republic, 2011, pp. 140-145.
The Porter Stemming Algorithm – Porter’s homepage. [Електронний ресурс]. – Режим доступу: http://tartarus.org/~martin/ PorterStemmer/. – Назва з титул. екрану
Агєєв, М. Додаток А. Офіційні метрики РОМІП 2010 / М. Агєєв, І. Кураленок, І. некрестьянам // Праці РОМІП'2010. СПб .: Изд-во НУ ЦСМ. -2010. - c. 172-187.
Extractor [Електронний ресурс] – Режим доступу до ресурсу:https://extractor.com/.
OpenCalasis [Електронний ресурс] – Режим доступу до ресурсу: https://opencalasis.com/
Експертні КС [Електронний ресурс] – Режим доступу до ресурсу: https://expertcs;ua/.
Downloads
-
PDF (Українська)
Downloads: 421