Persian Sentence Representation Script

This script (p5_representer.py) is designed to simplify and represent complex Persian legal sentences as a set of simpler, more understandable sentences. It uses the meta-llama/Meta-Llama-3.1-8B-Instruct model for this task.

Note: For library versions, please refer to the requirements.txt file.

Model Used

Model: meta-llama/Meta-Llama-3.1-8B-Instruct
Loaded via HuggingFace Transformers (AutoModelForCausalLM, AutoTokenizer)

System and User Prompts

System prompt: Sets the model as a legal expert who explains legal texts in simple language for non-experts, without changing technical terms.
User prompt: Asks the model to rewrite the input legal text in a specified number of simple sentences in Persian.

Main Methods

1. `single_section_representation(content)`

Purpose: Simplifies a single legal text section.
Inputs:
- content (str): The legal text to be simplified.
Outputs:
- result (bool): Operation status.
- desc (str): Description of the result.
- sentences (list): List of simplified sentences.

2. `do_representation(sections)`

Purpose: Processes multiple sections and saves the results.
Inputs:
- sections (dict): Dictionary where each key is a section ID and each value contains a content field.
Outputs:
- operation_result (bool): Overall operation status.
- sections (dict): The input dictionary with an added represented_sentences field for each section.

Example Input

sections = {
    "1": {"content": "این یک متن حقوقی پیچیده است که باید ساده شود."},
    "2": {"content": "متن حقوقی دوم برای بازنمایی."}
}
result, output_sections = do_representation(sections)

Output

Each section will have a new field represented_sentences containing the simplified sentences.

Notes

The script automatically uses GPU if available.
Errors for each section are logged in the ./data/represent/ directory.
The output JSON file is saved in ./data/represent/.

2.1 KiB Raw Blame History