# Persian Sentence Representation Script This script (`p5_representer.py`) is designed to simplify and represent complex Persian legal sentences as a set of simpler, more understandable sentences. It uses the `meta-llama/Meta-Llama-3.1-8B-Instruct` model for this task. **Note:** For library versions, please refer to the `requirements.txt` file. ## Model Used - Model: `meta-llama/Meta-Llama-3.1-8B-Instruct` - Loaded via HuggingFace Transformers (`AutoModelForCausalLM`, `AutoTokenizer`) ## System and User Prompts - **System prompt:** Sets the model as a legal expert who explains legal texts in simple language for non-experts, without changing technical terms. - **User prompt:** Asks the model to rewrite the input legal text in a specified number of simple sentences in Persian. ## Main Methods ### 1. `single_section_representation(content)` - **Purpose:** Simplifies a single legal text section. - **Inputs:** - `content` (str): The legal text to be simplified. - **Outputs:** - `result` (bool): Operation status. - `desc` (str): Description of the result. - `sentences` (list): List of simplified sentences. ### 2. `do_representation(sections)` - **Purpose:** Processes multiple sections and saves the results. - **Inputs:** - `sections` (dict): Dictionary where each key is a section ID and each value contains a `content` field. - **Outputs:** - `operation_result` (bool): Overall operation status. - `sections` (dict): The input dictionary with an added `represented_sentences` field for each section. ## Example Input ```python sections = { "1": {"content": "این یک متن حقوقی پیچیده است که باید ساده شود."}, "2": {"content": "متن حقوقی دوم برای بازنمایی."} } result, output_sections = do_representation(sections) ``` ## Output Each section will have a new field `represented_sentences` containing the simplified sentences. ## Notes - The script automatically uses GPU if available. - Errors for each section are logged in the `./data/represent/` directory. - The output JSON file is saved in `./data/represent/`.