data_processes/readme/readme-ner-recognizer-en.md
2025-08-16 15:15:24 +03:30

66 lines
1.9 KiB
Markdown

# Named Entity Recognition (NER) Script
This project provides a Python script (`p2_ner_recognizer.py`) for extracting named entities from text sections using a trained NER model. The script is designed to identify entities such as names, organizations, locations, and more, which is useful for information extraction and text analysis tasks.
## Requirements
Before using this script, please install the required libraries:
```bash
pip install flair
```
You also need a trained NER model. Update the `model` path in the script to point to your model file.
## How It Works
- The script loads a trained NER model using the Flair library.
- It processes each text section, splits long texts into smaller parts if needed, and extracts named entities.
- The results are saved in a JSON file for further use.
## Main Functions
- `single_ner_recognizer(input_sentence)`: Extracts named entities from a single sentence or text.
- `do_ner_recognize(sections)`: Processes all sections in a dictionary, extracts entities, and saves the results.
## Usage Example
Suppose you have your sections data as a dictionary:
```python
sections = {
"1": {"content": "First section text"},
"2": {"content": "Second section text"}
}
```
You can extract named entities for all sections as follows:
```python
from p2_ner_recognizer import do_ner_recognize
result = do_ner_recognize(sections)
```
After running, the results will be saved in a JSON file in the `./data/ner/` directory.
## Output Structure
Each section will have a new field `ners_v2` with the extracted entities:
```json
"1": {
"content": "First section text",
"ners_v2": [
{"key": "PERSON", "value": "John Doe", "begin": 0, "end": 2, "score": 0.98},
...
]
}
```
## Notes
- Make sure the model path is correct and the model file is available.
- The script supports Persian language if the model is trained for it.
- The output JSON file will be saved in `./data/ner/`.