insert representer readme files

2025-08-16 16:29:50 +03:30 · 2025-08-16 16:29:50 +03:30 · 16edcb599d
commit 16edcb599d
parent 3bdaddcd61
3 changed files with 108 additions and 1 deletions
--- a/p5_representer.py
+++ b/p5_representer.py
@ -18,7 +18,6 @@ counter = 0
 total = 0

 id = ''
-keywords_count = 15
        
 def single_section_representation(content):
    """
--- a/readme/readme-representer-en.md
+++ b/readme/readme-representer-en.md
@ -0,0 +1,54 @@
+# Persian Sentence Representation Script
+
+This script (`p5_representer.py`) is designed to simplify and represent complex Persian legal sentences as a set of simpler, more understandable sentences. It uses the `meta-llama/Meta-Llama-3.1-8B-Instruct` model for this task.
+
+**Note:** For library versions, please refer to the `requirements.txt` file.
+
+## Model Used
+
+- Model: `meta-llama/Meta-Llama-3.1-8B-Instruct`
+- Loaded via HuggingFace Transformers (`AutoModelForCausalLM`, `AutoTokenizer`)
+
+## System and User Prompts
+
+- **System prompt:** Sets the model as a legal expert who explains legal texts in simple language for non-experts, without changing technical terms.
+- **User prompt:** Asks the model to rewrite the input legal text in a specified number of simple sentences in Persian.
+
+## Main Methods
+
+### 1. `single_section_representation(content)`
+- **Purpose:** Simplifies a single legal text section.
+- **Inputs:** 
+  - `content` (str): The legal text to be simplified.
+- **Outputs:** 
+  - `result` (bool): Operation status.
+  - `desc` (str): Description of the result.
+  - `sentences` (list): List of simplified sentences.
+
+### 2. `do_representation(sections)`
+- **Purpose:** Processes multiple sections and saves the results.
+- **Inputs:** 
+  - `sections` (dict): Dictionary where each key is a section ID and each value contains a `content` field.
+- **Outputs:** 
+  - `operation_result` (bool): Overall operation status.
+  - `sections` (dict): The input dictionary with an added `represented_sentences` field for each section.
+
+## Example Input
+
+```python
+sections = {
+    "1": {"content": "این یک متن حقوقی پیچیده است که باید ساده شود."},
+    "2": {"content": "متن حقوقی دوم برای بازنمایی."}
+}
+result, output_sections = do_representation(sections)
+```
+
+## Output
+
+Each section will have a new field `represented_sentences` containing the simplified sentences.
+
+## Notes
+
+- The script automatically uses GPU if available.
+- Errors for each section are logged in the `./data/represent/` directory.
+- The output JSON file is saved in `./data/represent/`.
--- a/readme/readme-representer-fa.md
+++ b/readme/readme-representer-fa.md
@ -0,0 +1,54 @@
+# اسکریپت بازنمایی جملات فارسی
+
+این اسکریپت (`p5_representer.py`) برای ساده‌سازی و بازنمایی جملات پیچیده حقوقی فارسی به مجموعه‌ای از جملات ساده‌تر و قابل فهم‌تر طراحی شده است. مدل مورد استفاده در این سورس `meta-llama/Meta-Llama-3.1-8B-Instruct` می‌باشد.
+
+**نکته:** برای مشاهده نسخه کتابخانه‌ها به فایل `requirements.txt` مراجعه کنید.
+
+## مدل مورد استفاده
+
+- مدل: `meta-llama/Meta-Llama-3.1-8B-Instruct`
+- بارگذاری با استفاده از کتابخانه Transformers (توکنایزر و مدل)
+
+## پرامپت‌های سیستمی و کاربری
+
+- **پرامپت سیستمی:** مدل را به عنوان یک وکیل حقوق‌دان معرفی می‌کند که باید متون حقوقی را بدون تغییر اصطلاحات فنی، به زبان ساده برای افراد غیرحقوق‌دان توضیح دهد.
+- **پرامپت کاربری:** از مدل می‌خواهد متن ورودی را در تعداد جمله مشخص، ساده و روان به زبان فارسی بازنویسی کند.
+
+## متدهای اصلی
+
+### ۱. `single_section_representation(content)`
+- **هدف:** ساده‌سازی یک بخش متنی حقوقی.
+- **ورودی:** 
+  - `content` (رشته): متن حقوقی برای ساده‌سازی.
+- **خروجی:** 
+  - `result` (بولین): وضعیت عملیات.
+  - `desc` (رشته): توضیح نتیجه.
+  - `sentences` (لیست): لیست جملات ساده‌شده.
+
+### ۲. `do_representation(sections)`
+- **هدف:** پردازش چندین بخش و ذخیره نتایج.
+- **ورودی:** 
+  - `sections` (دیکشنری): هر کلید شناسه بخش و مقدار آن شامل فیلد `content` است.
+- **خروجی:** 
+  - `operation_result` (بولین): وضعیت کلی عملیات.
+  - `sections` (دیکشنری): دیکشنری ورودی با فیلد جدید `represented_sentences` برای هر بخش.
+
+## مثال ورودی
+
+```python
+sections = {
+    "1": {"content": "این یک متن حقوقی پیچیده است که باید ساده شود."},
+    "2": {"content": "متن حقوقی دوم برای بازنمایی."}
+}
+result, output_sections = do_representation(sections)
+```
+
+## خروجی
+
+برای هر بخش، یک فیلد جدید به نام `represented_sentences` شامل جملات ساده‌شده اضافه می‌شود.
+
+## نکات
+
+- اسکریپت به صورت خودکار در صورت وجود GPU از آن استفاده می‌کند.
+- خطاهای هر بخش در مسیر `./data/represent/` ثبت می‌شوند.
+- فایل خروجی JSON در مسیر `./data/represent/` ذخیره می‌شود.