66 lines
1.9 KiB
Markdown
66 lines
1.9 KiB
Markdown
# Named Entity Recognition (NER) Script
|
|
|
|
This project provides a Python script (`p2_ner_recognizer.py`) for extracting named entities from text sections using a trained NER model. The script is designed to identify entities such as names, organizations, locations, and more, which is useful for information extraction and text analysis tasks.
|
|
|
|
## Requirements
|
|
|
|
Before using this script, please install the required libraries:
|
|
|
|
```bash
|
|
pip install flair
|
|
```
|
|
|
|
You also need a trained NER model. Update the `model` path in the script to point to your model file.
|
|
|
|
## How It Works
|
|
|
|
- The script loads a trained NER model using the Flair library.
|
|
- It processes each text section, splits long texts into smaller parts if needed, and extracts named entities.
|
|
- The results are saved in a JSON file for further use.
|
|
|
|
## Main Functions
|
|
|
|
- `single_ner_recognizer(input_sentence)`: Extracts named entities from a single sentence or text.
|
|
- `do_ner_recognize(sections)`: Processes all sections in a dictionary, extracts entities, and saves the results.
|
|
|
|
## Usage Example
|
|
|
|
Suppose you have your sections data as a dictionary:
|
|
|
|
```python
|
|
sections = {
|
|
"1": {"content": "First section text"},
|
|
"2": {"content": "Second section text"}
|
|
}
|
|
```
|
|
|
|
You can extract named entities for all sections as follows:
|
|
|
|
```python
|
|
from p2_ner_recognizer import do_ner_recognize
|
|
|
|
result = do_ner_recognize(sections)
|
|
```
|
|
|
|
After running, the results will be saved in a JSON file in the `./data/ner/` directory.
|
|
|
|
## Output Structure
|
|
|
|
Each section will have a new field `ners_v2` with the extracted entities:
|
|
|
|
```json
|
|
"1": {
|
|
"content": "First section text",
|
|
"ners_v2": [
|
|
{"key": "PERSON", "value": "John Doe", "begin": 0, "end": 2, "score": 0.98},
|
|
...
|
|
]
|
|
}
|
|
```
|
|
|
|
## Notes
|
|
|
|
- Make sure the model path is correct and the model file is available.
|
|
- The script supports Persian language if the model is trained for it.
|
|
- The output JSON file will be saved in `./data/ner/`.
|