1.8 KiB
1.8 KiB
Sentence Embedding Generator
This project provides a Python script (embedding.py
) for generating sentence embeddings using the [Sentence Transformers]library.
Requirements
Before using this script, please install the required libraries:
pip install sentence-transformers numpy
How It Works
- The script uses the pre-trained model:
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
. - There are two main functions:
single_section_embedder(sentence)
: Takes a sentence (string) and returns its embedding as a vector.do_word_embedder(sections)
: Takes a dictionary of sections (each with acontent
field), generates embeddings for each section, and saves the results as a JSON file.
Usage
1. Get Embedding for a Single Sentence
from embedding import single_section_embedder
sentence = "This is a sample sentence."
embedding = single_section_embedder(sentence)
print(embedding)
2. Generate Embeddings for Multiple Sections and Save to File
Suppose your data is structured like this:
sections = {
"1": {"content": "First section text"},
"2": {"content": "Second section text"}
}
You can generate and save embeddings as follows:
from embedding import do_word_embedder
result = do_word_embedder(sections)
After running, a file named like sections_embeddings_YEAR-MONTH-DAY-HOUR.json
will be created in the ./data/embeddings/
directory, containing the embeddings for each section.
Output Structure
The output is a JSON file where each section has its embedding added:
{
"1": {
"content": "First section text",
"embeddings": [0.123, 0.456, ...]
},
...
}
Notes
- Make sure the folder
./data/embeddings/
exists before running the script. - The script supports Persian language.