data_processes/readme/ner-recognizer-en.md
2025-08-16 15:14:27 +03:30

1.9 KiB

Named Entity Recognition (NER) Script

This project provides a Python script (p2_ner_recognizer.py) for extracting named entities from text sections using a trained NER model. The script is designed to identify entities such as names, organizations, locations, and more, which is useful for information extraction and text analysis tasks.

Requirements

Before using this script, please install the required libraries:

pip install flair

You also need a trained NER model. Update the model path in the script to point to your model file.

How It Works

  • The script loads a trained NER model using the Flair library.
  • It processes each text section, splits long texts into smaller parts if needed, and extracts named entities.
  • The results are saved in a JSON file for further use.

Main Functions

  • single_ner_recognizer(input_sentence): Extracts named entities from a single sentence or text.
  • do_ner_recognize(sections): Processes all sections in a dictionary, extracts entities, and saves the results.

Usage Example

Suppose you have your sections data as a dictionary:

sections = {
    "1": {"content": "First section text"},
    "2": {"content": "Second section text"}
}

You can extract named entities for all sections as follows:

from p2_ner_recognizer import do_ner_recognize

result = do_ner_recognize(sections)

After running, the results will be saved in a JSON file in the ./data/ner/ directory.

Output Structure

Each section will have a new field ners_v2 with the extracted entities:

"1": {
  "content": "First section text",
  "ners_v2": [
    {"key": "PERSON", "value": "John Doe", "begin": 0, "end": 2, "score": 0.98},
    ...
  ]
}

Notes

  • Make sure the model path is correct and the model file is available.
  • The script supports Persian language if the model is trained for it.
  • The output JSON file will be saved in ./data/ner/.