55 lines
2.1 KiB
Markdown
55 lines
2.1 KiB
Markdown
# Persian Sentence Representation Script
|
|
|
|
This script (`p5_representer.py`) is designed to simplify and represent complex Persian legal sentences as a set of simpler, more understandable sentences. It uses the `meta-llama/Meta-Llama-3.1-8B-Instruct` model for this task.
|
|
|
|
**Note:** For library versions, please refer to the `requirements.txt` file.
|
|
|
|
## Model Used
|
|
|
|
- Model: `meta-llama/Meta-Llama-3.1-8B-Instruct`
|
|
- Loaded via HuggingFace Transformers (`AutoModelForCausalLM`, `AutoTokenizer`)
|
|
|
|
## System and User Prompts
|
|
|
|
- **System prompt:** Sets the model as a legal expert who explains legal texts in simple language for non-experts, without changing technical terms.
|
|
- **User prompt:** Asks the model to rewrite the input legal text in a specified number of simple sentences in Persian.
|
|
|
|
## Main Methods
|
|
|
|
### 1. `single_section_representation(content)`
|
|
- **Purpose:** Simplifies a single legal text section.
|
|
- **Inputs:**
|
|
- `content` (str): The legal text to be simplified.
|
|
- **Outputs:**
|
|
- `result` (bool): Operation status.
|
|
- `desc` (str): Description of the result.
|
|
- `sentences` (list): List of simplified sentences.
|
|
|
|
### 2. `do_representation(sections)`
|
|
- **Purpose:** Processes multiple sections and saves the results.
|
|
- **Inputs:**
|
|
- `sections` (dict): Dictionary where each key is a section ID and each value contains a `content` field.
|
|
- **Outputs:**
|
|
- `operation_result` (bool): Overall operation status.
|
|
- `sections` (dict): The input dictionary with an added `represented_sentences` field for each section.
|
|
|
|
## Example Input
|
|
|
|
```python
|
|
sections = {
|
|
"1": {"content": "این یک متن حقوقی پیچیده است که باید ساده شود."},
|
|
"2": {"content": "متن حقوقی دوم برای بازنمایی."}
|
|
}
|
|
result, output_sections = do_representation(sections)
|
|
```
|
|
|
|
## Output
|
|
|
|
Each section will have a new field `represented_sentences` containing the simplified sentences.
|
|
|
|
## Notes
|
|
|
|
- The script automatically uses GPU if available.
|
|
- Errors for each section are logged in the `./data/represent/` directory.
|
|
- The output JSON file is saved in `./data/represent/`.
|