data_processes/readme/readme-ner-recognizer-en.md

# Named Entity Recognition (NER) Script

This project provides a Python script (`p2_ner_recognizer.py`) for extracting named entities from text sections using a trained NER model. The script is designed to identify entities such as names, organizations, locations, and more, which is useful for information extraction and text analysis tasks.

## Requirements

Before using this script, please install the required libraries:

```bash
pip install flair
```

You also need a trained NER model. Update the `model` path in the script to point to your model file.

## How It Works

- The script loads a trained NER model using the Flair library.
- It processes each text section, splits long texts into smaller parts if needed, and extracts named entities.
- The results are saved in a JSON file for further use.

## Main Functions

- `single_ner_recognizer(input_sentence)`: Extracts named entities from a single sentence or text.
- `do_ner_recognize(sections)`: Processes all sections in a dictionary, extracts entities, and saves the results.

## Usage Example

Suppose you have your sections data as a dictionary:

```python
sections = {
    "1": {"content": "First section text"},
    "2": {"content": "Second section text"}
}
```

You can extract named entities for all sections as follows:

```python
from p2_ner_recognizer import do_ner_recognize

result = do_ner_recognize(sections)
```

After running, the results will be saved in a JSON file in the `./data/ner/` directory.

## Output Structure

Each section will have a new field `ners_v2` with the extracted entities:

```json
"1": {
  "content": "First section text",
  "ners_v2": [
    {"key": "PERSON", "value": "John Doe", "begin": 0, "end": 2, "score": 0.98},
    ...
  ]
}
```

## Notes

- Make sure the model path is correct and the model file is available.
- The script supports Persian language if the model is trained for it.
- The output JSON file will be saved in `./data/ner/`.