rename readme files

2025-08-21 15:12:06 +03:30 · 2025-08-21 15:12:06 +03:30 · 55b88ce189
commit 55b88ce189
parent 8b789db37c
3 changed files with 212 additions and 212 deletions
--- a/README.md
+++ b/README.md
@ -1,139 +1,104 @@
 # NER (Named Entity Recognition)
-## Requirements
+# آموزش مدل NER فارسی با Flair
 ````shell
 pip install flair
 ````
 ## Download Models
 download models and place in data folder
 https://drive.google.com/file/d/1mBW3zA8sd1zDo7KOiUCXmG64h8eJc_ip/view
-## Getting started
+این پروژه برای آموزش یک مدل تشخیص موجودیت‌های نامدار (NER) روی داده‌های حقوقی به زبان فارسی طراحی شده است.  
 کد موجود از کتابخانه **Flair** برای آموزش و ریزتنظیم (Fine-tune) مدل‌های مبتنی بر ترانسفورمر استفاده می‌کند.
-````shell
+---
 python flair_ner_inference_.py
 ````
-for train:
+## ویژگی‌ها
 - پشتیبانی از زبان فارسی
 - استفاده از مدل‌های از پیش آموزش‌دیده (Pretrained Transformers)
 - ذخیره نتایج آموزش و ارزیابی به صورت فایل
 - قابلیت تست مدل آموزش‌دیده روی داده‌های جدید
-````shell
+---
 python flair_ner_train.py
 ````
-## Project Structure
+## پیش‌نیازها
-A simple view of project's dependencies tree:
+قبل از اجرای کد، نیاز است پکیج‌های زیر نصب شده باشند:
-
+```bash
-![project structure image](./images/project_structure.png)
+pip install flair transformers torch
 This image made with **pylint** and **graphviz**:
 ````shell
 pip install pylint graphviz
 pyreverse -o png -p <YourProjectName> <path/to/your/project>
 ````
 ## Documentation
 Flair is:
 * **A powerful NLP library.** Flair allows you to apply our state-of-the-art natural language processing (NLP)
 models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS),
  special support for [biomedical data](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md),
 sense disambiguation and classification, with support for a rapidly growing number of languages.
 * **A text embedding library.** Flair has simple interfaces that allow you to use and combine different word and
 document embeddings, including our proposed [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and various transformers.
 * **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
 train your own models and experiment with new approaches using Flair embeddings and classes.
 ## Quick Start Flair
 ### Requirements and Installation
 In your favorite virtual environment, simply do:
 ```
 pip install flair
 ```
-Flair requires Python 3.7+. 
+---
-### Example 1: Tag Entities in Text
+## تنظیمات اولیه
 سه پارامتر اصلی آموزش در ابتدای کد تعریف شده‌اند:
-Let's run **named entity recognition** (NER) over an example sentence. All you need to do is make a `Sentence`, load
+- **LEARNING_RATE**: نرخ یادگیری (مثال: `0.65e-4`)
-a pre-trained model and use it to predict tags for the sentence:
+- **MINI_BATCH_SIZE**: سایز مینی‌بچ (مثال: `8`)
 - **MAX_EPOCHS**: حداکثر تعداد تکرار آموزش (مثال: `100`)
-```python
+---
 from flair.data import Sentence
 from flair.nn import Classifier
-# make a sentence
+## ساختار داده‌ها
-sentence = Sentence('I love Berlin .')
+داده‌ها باید در پوشه `./data/` قرار گیرند و فرمت آن به صورت ستونی (ColumnCorpus) باشد:
-
+```
-# load the NER tagger
+token label
 tagger = Classifier.load('ner')
 # run NER over sentence
 tagger.predict(sentence)
 # print the sentence with all annotations
 print(sentence)
 ```
-This should print:
+مثال:
-
+```
-```console
+علی B-PER
-Sentence: "I love Berlin ." → ["Berlin"/LOC]
+به O
 دادگاه B-ORG
 رفت O
 ```
-This means that "Berlin" was tagged as a **location entity** in this sentence. 
+---
-   * *to learn more about NER tagging in Flair, check out our [NER tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-entities)!*
+## متدهای اصلی
 ### `main_train(model: str) -> bool`
 - **ورودی**: نام مدل ترانسفورمر (مثل: `"HooshvareLab/bert-fa-base-uncased-ner-peyma"`)
 - **خروجی**: مقدار بولین (موفقیت یا شکست)
 - **عملکرد**:
  1. بارگذاری داده‌ها و آماده‌سازی فرهنگ برچسب‌ها
  2. بارگذاری و پیکربندی embeddingها
  3. ایجاد مدل NER با SequenceTagger
  4. آموزش مدل با استفاده از Flair ModelTrainer
  5. ذخیره مدل و نتایج آموزش
  6. تست مدل روی داده جدید
  7. ارزیابی عملکرد و محاسبه F1
-### Example 2: Detect Sentiment 
+---
-Let's run **sentiment analysis** over an example sentence to determine whether it is POSITIVE or NEGATIVE.
+## اجرای آموزش
 Same code as above, just a different model: 
-```python
+برای آموزش مدل، کافی است کد اصلی اجرا شود:
-from flair.data import Sentence
+```bash
-from flair.nn import Classifier
+python train.py
 # make a sentence
 sentence = Sentence('I love Berlin .')
 # load the NER tagger
 tagger = Classifier.load('sentiment')
 # run NER over sentence
 tagger.predict(sentence)
 # print the sentence with all annotations
 print(sentence)
 ```
-This should print:
+مدل خروجی در پوشه `./taggers/` ذخیره خواهد شد. نام مدل شامل تاریخ و ساعت آموزش است.
-```console
+---
 Sentence[4]: "I love Berlin ." → POSITIVE (0.9983)
 ```
-This means that the sentence "I love Berlin" was tagged as having **POSITIVE** sentiment. 
+## سناریوی تست
-   * *to learn more about sentiment analysis in Flair, check out our [sentiment analysis tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment)!*
+پس از پایان آموزش:
 1. یک تست سریع روی یک ورودی ساده با استفاده از `inference.py` انجام می‌شود.
 2. ارزیابی مدل با `evaluate_model.py` اجرا می‌شود.
 3. نتایج در فایل `test-result.txt` ذخیره می‌گردد.
-## Tutorials
+---
-On our new :fire: [**Flair documentation page**](https://flairnlp.github.io/docs/intro) you will find many tutorials to get you started!
+## خروجی‌ها
 - مدل آموزش‌دیده در پوشه `./taggers/`
 - فایل `test-result.txt` شامل نتایج آموزش و ارزیابی
 - لاگ آموزش برای رسم نمودار
-In particular: 
+---
 - [Tutorial 1: Basic tagging](https://flairnlp.github.io/docs/category/tutorial-1-basic-tagging) → how to tag your text 
 - [Tutorial 2: Training models](https://flairnlp.github.io/docs/category/tutorial-2-training-models) → how to train your own state-of-the-art NLP models 
 - [Tutorial 3: Embeddings](https://flairnlp.github.io/docs/category/tutorial-3-embeddings) → how to produce embeddings for words and documents
-There is also a dedicated landing page for our [biomedical NER and datasets](/resources/docs/HUNFLAIR.md) with
+## نکات مهم
-installation instructions and tutorials.
+- این کد برای داده‌های **حقوقی** طراحی شده است اما می‌توان آن را روی سایر داده‌های فارسی نیز استفاده کرد.
 - در صورت قطع آموزش، اجرای دوباره فرآیند باعث ایجاد مدل جدید با نام متفاوت می‌شود.
 - برای بهبود نتایج، می‌توانید:
  - نرخ یادگیری (Learning Rate) را تغییر دهید.
  - سایز مینی‌بچ (Mini Batch Size) را بزرگ‌تر کنید.
  - تعداد epochها را افزایش دهید.
 ---
 ## توسعه‌دهندگان
 این پروژه با هدف پردازش زبان طبیعی فارسی در حوزه حقوقی توسعه داده شده است.
--- a/old-README.md
+++ b/old-README.md
@ -0,0 +1,139 @@
 # NER (Named Entity Recognition)
 ## Requirements
 ````shell
 pip install flair
 ````
 ## Download Models
 download models and place in data folder
 https://drive.google.com/file/d/1mBW3zA8sd1zDo7KOiUCXmG64h8eJc_ip/view
 ## Getting started
 ````shell
 python flair_ner_inference_.py
 ````
 for train:
 ````shell
 python flair_ner_train.py
 ````
 ## Project Structure
 A simple view of project's dependencies tree:
 ![project structure image](./images/project_structure.png)
 This image made with **pylint** and **graphviz**:
 ````shell
 pip install pylint graphviz
 pyreverse -o png -p <YourProjectName> <path/to/your/project>
 ````
 ## Documentation
 Flair is:
 * **A powerful NLP library.** Flair allows you to apply our state-of-the-art natural language processing (NLP)
 models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS),
  special support for [biomedical data](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md),
 sense disambiguation and classification, with support for a rapidly growing number of languages.
 * **A text embedding library.** Flair has simple interfaces that allow you to use and combine different word and
 document embeddings, including our proposed [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and various transformers.
 * **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
 train your own models and experiment with new approaches using Flair embeddings and classes.
 ## Quick Start Flair
 ### Requirements and Installation
 In your favorite virtual environment, simply do:
 ```
 pip install flair
 ```
 Flair requires Python 3.7+. 
 ### Example 1: Tag Entities in Text
 Let's run **named entity recognition** (NER) over an example sentence. All you need to do is make a `Sentence`, load
 a pre-trained model and use it to predict tags for the sentence:
 ```python
 from flair.data import Sentence
 from flair.nn import Classifier
 # make a sentence
 sentence = Sentence('I love Berlin .')
 # load the NER tagger
 tagger = Classifier.load('ner')
 # run NER over sentence
 tagger.predict(sentence)
 # print the sentence with all annotations
 print(sentence)
 ```
 This should print:
 ```console
 Sentence: "I love Berlin ." → ["Berlin"/LOC]
 ```
 This means that "Berlin" was tagged as a **location entity** in this sentence. 
   * *to learn more about NER tagging in Flair, check out our [NER tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-entities)!*
 ### Example 2: Detect Sentiment 
 Let's run **sentiment analysis** over an example sentence to determine whether it is POSITIVE or NEGATIVE.
 Same code as above, just a different model: 
 ```python
 from flair.data import Sentence
 from flair.nn import Classifier
 # make a sentence
 sentence = Sentence('I love Berlin .')
 # load the NER tagger
 tagger = Classifier.load('sentiment')
 # run NER over sentence
 tagger.predict(sentence)
 # print the sentence with all annotations
 print(sentence)
 ```
 This should print:
 ```console
 Sentence[4]: "I love Berlin ." → POSITIVE (0.9983)
 ```
 This means that the sentence "I love Berlin" was tagged as having **POSITIVE** sentiment. 
   * *to learn more about sentiment analysis in Flair, check out our [sentiment analysis tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment)!*
 ## Tutorials
 On our new :fire: [**Flair documentation page**](https://flairnlp.github.io/docs/intro) you will find many tutorials to get you started!
 In particular: 
 - [Tutorial 1: Basic tagging](https://flairnlp.github.io/docs/category/tutorial-1-basic-tagging) → how to tag your text 
 - [Tutorial 2: Training models](https://flairnlp.github.io/docs/category/tutorial-2-training-models) → how to train your own state-of-the-art NLP models 
 - [Tutorial 3: Embeddings](https://flairnlp.github.io/docs/category/tutorial-3-embeddings) → how to produce embeddings for words and documents
 There is also a dedicated landing page for our [biomedical NER and datasets](/resources/docs/HUNFLAIR.md) with
 installation instructions and tutorials.
--- a/readme-train.md
+++ b/readme-train.md
@ -1,104 +0,0 @@
 # آموزش مدل NER فارسی با Flair
 این پروژه برای آموزش یک مدل تشخیص موجودیت‌های نامدار (NER) روی داده‌های حقوقی به زبان فارسی طراحی شده است.  
 کد موجود از کتابخانه **Flair** برای آموزش و ریزتنظیم (Fine-tune) مدل‌های مبتنی بر ترانسفورمر استفاده می‌کند.
 ---
 ## ویژگی‌ها
 - پشتیبانی از زبان فارسی
 - استفاده از مدل‌های از پیش آموزش‌دیده (Pretrained Transformers)
 - ذخیره نتایج آموزش و ارزیابی به صورت فایل
 - قابلیت تست مدل آموزش‌دیده روی داده‌های جدید
 ---
 ## پیش‌نیازها
 قبل از اجرای کد، نیاز است پکیج‌های زیر نصب شده باشند:
 ```bash
 pip install flair transformers torch
 ```
 ---
 ## تنظیمات اولیه
 سه پارامتر اصلی آموزش در ابتدای کد تعریف شده‌اند:
 - **LEARNING_RATE**: نرخ یادگیری (مثال: `0.65e-4`)
 - **MINI_BATCH_SIZE**: سایز مینی‌بچ (مثال: `8`)
 - **MAX_EPOCHS**: حداکثر تعداد تکرار آموزش (مثال: `100`)
 ---
 ## ساختار داده‌ها
 داده‌ها باید در پوشه `./data/` قرار گیرند و فرمت آن به صورت ستونی (ColumnCorpus) باشد:
 ```
 token label
 ```
 مثال:
 ```
 علی B-PER
 به O
 دادگاه B-ORG
 رفت O
 ```
 ---
 ## متدهای اصلی
 ### `main_train(model: str) -> bool`
 - **ورودی**: نام مدل ترانسفورمر (مثل: `"HooshvareLab/bert-fa-base-uncased-ner-peyma"`)
 - **خروجی**: مقدار بولین (موفقیت یا شکست)
 - **عملکرد**:
  1. بارگذاری داده‌ها و آماده‌سازی فرهنگ برچسب‌ها
  2. بارگذاری و پیکربندی embeddingها
  3. ایجاد مدل NER با SequenceTagger
  4. آموزش مدل با استفاده از Flair ModelTrainer
  5. ذخیره مدل و نتایج آموزش
  6. تست مدل روی داده جدید
  7. ارزیابی عملکرد و محاسبه F1
 ---
 ## اجرای آموزش
 برای آموزش مدل، کافی است کد اصلی اجرا شود:
 ```bash
 python train.py
 ```
 مدل خروجی در پوشه `./taggers/` ذخیره خواهد شد. نام مدل شامل تاریخ و ساعت آموزش است.
 ---
 ## سناریوی تست
 پس از پایان آموزش:
 1. یک تست سریع روی یک ورودی ساده با استفاده از `inference.py` انجام می‌شود.
 2. ارزیابی مدل با `evaluate_model.py` اجرا می‌شود.
 3. نتایج در فایل `test-result.txt` ذخیره می‌گردد.
 ---
 ## خروجی‌ها
 - مدل آموزش‌دیده در پوشه `./taggers/`
 - فایل `test-result.txt` شامل نتایج آموزش و ارزیابی
 - لاگ آموزش برای رسم نمودار
 ---
 ## نکات مهم
 - این کد برای داده‌های **حقوقی** طراحی شده است اما می‌توان آن را روی سایر داده‌های فارسی نیز استفاده کرد.
 - در صورت قطع آموزش، اجرای دوباره فرآیند باعث ایجاد مدل جدید با نام متفاوت می‌شود.
 - برای بهبود نتایج، می‌توانید:
  - نرخ یادگیری (Learning Rate) را تغییر دهید.
  - سایز مینی‌بچ (Mini Batch Size) را بزرگ‌تر کنید.
  - تعداد epochها را افزایش دهید.
 ---
 ## توسعه‌دهندگان
 این پروژه با هدف پردازش زبان طبیعی فارسی در حوزه حقوقی توسعه داده شده است.