rename readme files

2025-08-21 15:12:06 +03:30 · 2025-08-21 15:12:06 +03:30 · 55b88ce189
commit 55b88ce189
parent 8b789db37c
3 changed files with 212 additions and 212 deletions
--- a/README.md
+++ b/README.md
@ -1,139 +1,104 @@
-# NER (Named Entity Recognition)

-## Requirements
-````shell
-pip install flair
-````
-## Download Models
-download models and place in data folder
-https://drive.google.com/file/d/1mBW3zA8sd1zDo7KOiUCXmG64h8eJc_ip/view
+# آموزش مدل NER فارسی با Flair

-## Getting started
+این پروژه برای آموزش یک مدل تشخیص موجودیت‌های نامدار (NER) روی داده‌های حقوقی به زبان فارسی طراحی شده است.  
+کد موجود از کتابخانه **Flair** برای آموزش و ریزتنظیم (Fine-tune) مدل‌های مبتنی بر ترانسفورمر استفاده می‌کند.

-````shell
-python flair_ner_inference_.py
-````
+---

-for train:
+## ویژگی‌ها
+- پشتیبانی از زبان فارسی
+- استفاده از مدل‌های از پیش آموزش‌دیده (Pretrained Transformers)
+- ذخیره نتایج آموزش و ارزیابی به صورت فایل
+- قابلیت تست مدل آموزش‌دیده روی داده‌های جدید

-````shell
-python flair_ner_train.py
-````
+---

-## Project Structure
-A simple view of project's dependencies tree:
-
-![project structure image](./images/project_structure.png)
-
-This image made with **pylint** and **graphviz**:
-
-````shell
-pip install pylint graphviz
-pyreverse -o png -p <YourProjectName> <path/to/your/project>
-````
-
-## Documentation
-
-Flair is:
-
-* **A powerful NLP library.** Flair allows you to apply our state-of-the-art natural language processing (NLP)
-models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS),
-  special support for [biomedical data](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md),
- sense disambiguation and classification, with support for a rapidly growing number of languages.
-
-* **A text embedding library.** Flair has simple interfaces that allow you to use and combine different word and
-document embeddings, including our proposed [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and various transformers.
-
-* **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
-train your own models and experiment with new approaches using Flair embeddings and classes.
-
-
-## Quick Start Flair
-
-### Requirements and Installation
-
-In your favorite virtual environment, simply do:
-
-```
-pip install flair
+## پیش‌نیازها
+قبل از اجرای کد، نیاز است پکیج‌های زیر نصب شده باشند:
+```bash
+pip install flair transformers torch
 ```

-Flair requires Python 3.7+. 
+---

-### Example 1: Tag Entities in Text
+## تنظیمات اولیه
+سه پارامتر اصلی آموزش در ابتدای کد تعریف شده‌اند:

-Let's run **named entity recognition** (NER) over an example sentence. All you need to do is make a `Sentence`, load
-a pre-trained model and use it to predict tags for the sentence:
+- **LEARNING_RATE**: نرخ یادگیری (مثال: `0.65e-4`)
+- **MINI_BATCH_SIZE**: سایز مینی‌بچ (مثال: `8`)
+- **MAX_EPOCHS**: حداکثر تعداد تکرار آموزش (مثال: `100`)

-```python
-from flair.data import Sentence
-from flair.nn import Classifier
+---

-# make a sentence
-sentence = Sentence('I love Berlin .')
-
-# load the NER tagger
-tagger = Classifier.load('ner')
-
-# run NER over sentence
-tagger.predict(sentence)
-
-# print the sentence with all annotations
-print(sentence)
+## ساختار داده‌ها
+داده‌ها باید در پوشه `./data/` قرار گیرند و فرمت آن به صورت ستونی (ColumnCorpus) باشد:
+```
+token label
 ```

-This should print:
-
-```console
-Sentence: "I love Berlin ." → ["Berlin"/LOC]
+مثال:
+```
+علی B-PER
+به O
+دادگاه B-ORG
+رفت O
 ```

-This means that "Berlin" was tagged as a **location entity** in this sentence. 
+---

-   * *to learn more about NER tagging in Flair, check out our [NER tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-entities)!*
+## متدهای اصلی

+### `main_train(model: str) -> bool`
+- **ورودی**: نام مدل ترانسفورمر (مثل: `"HooshvareLab/bert-fa-base-uncased-ner-peyma"`)
+- **خروجی**: مقدار بولین (موفقیت یا شکست)
+- **عملکرد**:
+  1. بارگذاری داده‌ها و آماده‌سازی فرهنگ برچسب‌ها
+  2. بارگذاری و پیکربندی embeddingها
+  3. ایجاد مدل NER با SequenceTagger
+  4. آموزش مدل با استفاده از Flair ModelTrainer
+  5. ذخیره مدل و نتایج آموزش
+  6. تست مدل روی داده جدید
+  7. ارزیابی عملکرد و محاسبه F1

-### Example 2: Detect Sentiment 
+---

-Let's run **sentiment analysis** over an example sentence to determine whether it is POSITIVE or NEGATIVE.
-Same code as above, just a different model: 
+## اجرای آموزش

-```python
-from flair.data import Sentence
-from flair.nn import Classifier
-
-# make a sentence
-sentence = Sentence('I love Berlin .')
-
-# load the NER tagger
-tagger = Classifier.load('sentiment')
-
-# run NER over sentence
-tagger.predict(sentence)
-
-# print the sentence with all annotations
-print(sentence)
+برای آموزش مدل، کافی است کد اصلی اجرا شود:
+```bash
+python train.py
 ```

-This should print:
+مدل خروجی در پوشه `./taggers/` ذخیره خواهد شد. نام مدل شامل تاریخ و ساعت آموزش است.

-```console
-Sentence[4]: "I love Berlin ." → POSITIVE (0.9983)
-```
+---

-This means that the sentence "I love Berlin" was tagged as having **POSITIVE** sentiment. 
+## سناریوی تست

-   * *to learn more about sentiment analysis in Flair, check out our [sentiment analysis tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment)!*
+پس از پایان آموزش:
+1. یک تست سریع روی یک ورودی ساده با استفاده از `inference.py` انجام می‌شود.
+2. ارزیابی مدل با `evaluate_model.py` اجرا می‌شود.
+3. نتایج در فایل `test-result.txt` ذخیره می‌گردد.

-## Tutorials
+---

-On our new :fire: [**Flair documentation page**](https://flairnlp.github.io/docs/intro) you will find many tutorials to get you started!
+## خروجی‌ها
+- مدل آموزش‌دیده در پوشه `./taggers/`
+- فایل `test-result.txt` شامل نتایج آموزش و ارزیابی
+- لاگ آموزش برای رسم نمودار

-In particular: 
- [Tutorial 1: Basic tagging](https://flairnlp.github.io/docs/category/tutorial-1-basic-tagging) → how to tag your text 
- [Tutorial 2: Training models](https://flairnlp.github.io/docs/category/tutorial-2-training-models) → how to train your own state-of-the-art NLP models 
- [Tutorial 3: Embeddings](https://flairnlp.github.io/docs/category/tutorial-3-embeddings) → how to produce embeddings for words and documents
+---

-There is also a dedicated landing page for our [biomedical NER and datasets](/resources/docs/HUNFLAIR.md) with
-installation instructions and tutorials.
+## نکات مهم
+- این کد برای داده‌های **حقوقی** طراحی شده است اما می‌توان آن را روی سایر داده‌های فارسی نیز استفاده کرد.
+- در صورت قطع آموزش، اجرای دوباره فرآیند باعث ایجاد مدل جدید با نام متفاوت می‌شود.
+- برای بهبود نتایج، می‌توانید:
+  - نرخ یادگیری (Learning Rate) را تغییر دهید.
+  - سایز مینی‌بچ (Mini Batch Size) را بزرگ‌تر کنید.
+  - تعداد epochها را افزایش دهید.

+---
+
+## توسعه‌دهندگان
+این پروژه با هدف پردازش زبان طبیعی فارسی در حوزه حقوقی توسعه داده شده است.
--- a/old-README.md
+++ b/old-README.md
@ -0,0 +1,139 @@
+# NER (Named Entity Recognition)
+
+## Requirements
+````shell
+pip install flair
+````
+## Download Models
+download models and place in data folder
+https://drive.google.com/file/d/1mBW3zA8sd1zDo7KOiUCXmG64h8eJc_ip/view
+
+## Getting started
+
+````shell
+python flair_ner_inference_.py
+````
+
+for train:
+
+````shell
+python flair_ner_train.py
+````
+
+## Project Structure
+A simple view of project's dependencies tree:
+
+![project structure image](./images/project_structure.png)
+
+This image made with **pylint** and **graphviz**:
+
+````shell
+pip install pylint graphviz
+pyreverse -o png -p <YourProjectName> <path/to/your/project>
+````
+
+## Documentation
+
+Flair is:
+
+* **A powerful NLP library.** Flair allows you to apply our state-of-the-art natural language processing (NLP)
+models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS),
+  special support for [biomedical data](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md),
+ sense disambiguation and classification, with support for a rapidly growing number of languages.
+
+* **A text embedding library.** Flair has simple interfaces that allow you to use and combine different word and
+document embeddings, including our proposed [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and various transformers.
+
+* **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
+train your own models and experiment with new approaches using Flair embeddings and classes.
+
+
+## Quick Start Flair
+
+### Requirements and Installation
+
+In your favorite virtual environment, simply do:
+
+```
+pip install flair
+```
+
+Flair requires Python 3.7+. 
+
+### Example 1: Tag Entities in Text
+
+Let's run **named entity recognition** (NER) over an example sentence. All you need to do is make a `Sentence`, load
+a pre-trained model and use it to predict tags for the sentence:
+
+```python
+from flair.data import Sentence
+from flair.nn import Classifier
+
+# make a sentence
+sentence = Sentence('I love Berlin .')
+
+# load the NER tagger
+tagger = Classifier.load('ner')
+
+# run NER over sentence
+tagger.predict(sentence)
+
+# print the sentence with all annotations
+print(sentence)
+```
+
+This should print:
+
+```console
+Sentence: "I love Berlin ." → ["Berlin"/LOC]
+```
+
+This means that "Berlin" was tagged as a **location entity** in this sentence. 
+
+   * *to learn more about NER tagging in Flair, check out our [NER tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-entities)!*
+
+
+### Example 2: Detect Sentiment 
+
+Let's run **sentiment analysis** over an example sentence to determine whether it is POSITIVE or NEGATIVE.
+Same code as above, just a different model: 
+
+```python
+from flair.data import Sentence
+from flair.nn import Classifier
+
+# make a sentence
+sentence = Sentence('I love Berlin .')
+
+# load the NER tagger
+tagger = Classifier.load('sentiment')
+
+# run NER over sentence
+tagger.predict(sentence)
+
+# print the sentence with all annotations
+print(sentence)
+```
+
+This should print:
+
+```console
+Sentence[4]: "I love Berlin ." → POSITIVE (0.9983)
+```
+
+This means that the sentence "I love Berlin" was tagged as having **POSITIVE** sentiment. 
+
+   * *to learn more about sentiment analysis in Flair, check out our [sentiment analysis tutorial](https://flairnlp.github.io/docs/tutorial-basics/tagging-sentiment)!*
+
+## Tutorials
+
+On our new :fire: [**Flair documentation page**](https://flairnlp.github.io/docs/intro) you will find many tutorials to get you started!
+
+In particular: 
+- [Tutorial 1: Basic tagging](https://flairnlp.github.io/docs/category/tutorial-1-basic-tagging) → how to tag your text 
+- [Tutorial 2: Training models](https://flairnlp.github.io/docs/category/tutorial-2-training-models) → how to train your own state-of-the-art NLP models 
+- [Tutorial 3: Embeddings](https://flairnlp.github.io/docs/category/tutorial-3-embeddings) → how to produce embeddings for words and documents
+
+There is also a dedicated landing page for our [biomedical NER and datasets](/resources/docs/HUNFLAIR.md) with
+installation instructions and tutorials.
+
--- a/readme-train.md
+++ b/readme-train.md
@ -1,104 +0,0 @@
-
-# آموزش مدل NER فارسی با Flair
-
-این پروژه برای آموزش یک مدل تشخیص موجودیت‌های نامدار (NER) روی داده‌های حقوقی به زبان فارسی طراحی شده است.  
-کد موجود از کتابخانه **Flair** برای آموزش و ریزتنظیم (Fine-tune) مدل‌های مبتنی بر ترانسفورمر استفاده می‌کند.
-
---
-
-## ویژگی‌ها
- پشتیبانی از زبان فارسی
- استفاده از مدل‌های از پیش آموزش‌دیده (Pretrained Transformers)
- ذخیره نتایج آموزش و ارزیابی به صورت فایل
- قابلیت تست مدل آموزش‌دیده روی داده‌های جدید
-
---
-
-## پیش‌نیازها
-قبل از اجرای کد، نیاز است پکیج‌های زیر نصب شده باشند:
-```bash
-pip install flair transformers torch
-```
-
---
-
-## تنظیمات اولیه
-سه پارامتر اصلی آموزش در ابتدای کد تعریف شده‌اند:
-
- **LEARNING_RATE**: نرخ یادگیری (مثال: `0.65e-4`)
- **MINI_BATCH_SIZE**: سایز مینی‌بچ (مثال: `8`)
- **MAX_EPOCHS**: حداکثر تعداد تکرار آموزش (مثال: `100`)
-
---
-
-## ساختار داده‌ها
-داده‌ها باید در پوشه `./data/` قرار گیرند و فرمت آن به صورت ستونی (ColumnCorpus) باشد:
-```
-token label
-```
-
-مثال:
-```
-علی B-PER
-به O
-دادگاه B-ORG
-رفت O
-```
-
---
-
-## متدهای اصلی
-
-### `main_train(model: str) -> bool`
- **ورودی**: نام مدل ترانسفورمر (مثل: `"HooshvareLab/bert-fa-base-uncased-ner-peyma"`)
- **خروجی**: مقدار بولین (موفقیت یا شکست)
- **عملکرد**:
-  1. بارگذاری داده‌ها و آماده‌سازی فرهنگ برچسب‌ها
-  2. بارگذاری و پیکربندی embeddingها
-  3. ایجاد مدل NER با SequenceTagger
-  4. آموزش مدل با استفاده از Flair ModelTrainer
-  5. ذخیره مدل و نتایج آموزش
-  6. تست مدل روی داده جدید
-  7. ارزیابی عملکرد و محاسبه F1
-
---
-
-## اجرای آموزش
-
-برای آموزش مدل، کافی است کد اصلی اجرا شود:
-```bash
-python train.py
-```
-
-مدل خروجی در پوشه `./taggers/` ذخیره خواهد شد. نام مدل شامل تاریخ و ساعت آموزش است.
-
---
-
-## سناریوی تست
-
-پس از پایان آموزش:
-1. یک تست سریع روی یک ورودی ساده با استفاده از `inference.py` انجام می‌شود.
-2. ارزیابی مدل با `evaluate_model.py` اجرا می‌شود.
-3. نتایج در فایل `test-result.txt` ذخیره می‌گردد.
-
---
-
-## خروجی‌ها
- مدل آموزش‌دیده در پوشه `./taggers/`
- فایل `test-result.txt` شامل نتایج آموزش و ارزیابی
- لاگ آموزش برای رسم نمودار
-
---
-
-## نکات مهم
- این کد برای داده‌های **حقوقی** طراحی شده است اما می‌توان آن را روی سایر داده‌های فارسی نیز استفاده کرد.
- در صورت قطع آموزش، اجرای دوباره فرآیند باعث ایجاد مدل جدید با نام متفاوت می‌شود.
- برای بهبود نتایج، می‌توانید:
-  - نرخ یادگیری (Learning Rate) را تغییر دهید.
-  - سایز مینی‌بچ (Mini Batch Size) را بزرگ‌تر کنید.
-  - تعداد epochها را افزایش دهید.
-
---
-
-## توسعه‌دهندگان
-این پروژه با هدف پردازش زبان طبیعی فارسی در حوزه حقوقی توسعه داده شده است.