data_processes/readme/readme-keyword-extractor-en.md
2025-08-16 15:57:21 +03:30

1.3 KiB

Keyword Extractor

This source is a script for extracting keywords from text using local LLM such as llama based on user prompts.

How it works

The script processes input text and extracts the most relevant keywords using a large language model(llm) and system and user prompts which are embedded in the source code.

Requirements

  • Python 3.8+
  • NLP libraries (transformers, torch, etc.)
  • Other utilities as listed in the requirements file

For exact versions of the libraries, please check the requirements.txt file.

Usage

  1. Clone the repository.
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Run the script:
    python keyword_extractor.py
    

Main Methods

  • load_model(): Loads the pre-trained transformer model for text processing. This is the main method for model initialization.
  • preprocess_text(text): Cleans and prepares the input text (e.g., lowercasing, removing stopwords, etc.).
  • extract_keywords(text, top_n=10): The core method that applies the model and retrieves the top keywords from the input text.
  • display_results(keywords): Prints or saves the extracted keywords for further use.

Model

The script uses a LLM such as llama3.1-8B for keyword extraction. The exact model can be changed in the code if needed.