2.1 KiB
2.1 KiB
Persian Sentence Representation Script
This script (p5_representer.py
) is designed to simplify and represent complex Persian legal sentences as a set of simpler, more understandable sentences. It uses the meta-llama/Meta-Llama-3.1-8B-Instruct
model for this task.
Note: For library versions, please refer to the requirements.txt
file.
Model Used
- Model:
meta-llama/Meta-Llama-3.1-8B-Instruct
- Loaded via HuggingFace Transformers (
AutoModelForCausalLM
,AutoTokenizer
)
System and User Prompts
- System prompt: Sets the model as a legal expert who explains legal texts in simple language for non-experts, without changing technical terms.
- User prompt: Asks the model to rewrite the input legal text in a specified number of simple sentences in Persian.
Main Methods
1. single_section_representation(content)
- Purpose: Simplifies a single legal text section.
- Inputs:
content
(str): The legal text to be simplified.
- Outputs:
result
(bool): Operation status.desc
(str): Description of the result.sentences
(list): List of simplified sentences.
2. do_representation(sections)
- Purpose: Processes multiple sections and saves the results.
- Inputs:
sections
(dict): Dictionary where each key is a section ID and each value contains acontent
field.
- Outputs:
operation_result
(bool): Overall operation status.sections
(dict): The input dictionary with an addedrepresented_sentences
field for each section.
Example Input
sections = {
"1": {"content": "این یک متن حقوقی پیچیده است که باید ساده شود."},
"2": {"content": "متن حقوقی دوم برای بازنمایی."}
}
result, output_sections = do_representation(sections)
Output
Each section will have a new field represented_sentences
containing the simplified sentences.
Notes
- The script automatically uses GPU if available.
- Errors for each section are logged in the
./data/represent/
directory. - The output JSON file is saved in
./data/represent/
.