# Sentence Embedding Generator This project provides a Python script (`p3_words_embedder.py`) for generating sentence embeddings using the [Sentence Transformers]library. ## Requirements Before using this script, please install the required libraries: ```bash pip install sentence-transformers numpy ``` ## How It Works - The script uses the pre-trained model: `sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`. - There are two main functions: - `single_section_embedder(sentence)`: Takes a sentence (string) and returns its embedding as a vector. - `do_word_embedder(sections)`: Takes a dictionary of sections (each with a `content` field), generates embeddings for each section, and saves the results as a JSON file. ## Usage ### 1. Get Embedding for a Single Sentence ```python from p3_words_embedder import single_section_embedder sentence = "This is a sample sentence." embedding = single_section_embedder(sentence) print(embedding) ``` ### 2. Generate Embeddings for Multiple Sections and Save to File Suppose your data is structured like this: ```python sections = { "1": {"content": "First section text"}, "2": {"content": "Second section text"} } ``` You can generate and save embeddings as follows: ```python from p3_words_embedder import do_word_embedder result = do_word_embedder(sections) ``` After running, a file named like `sections_embeddings_YEAR-MONTH-DAY-HOUR.json` will be created in the `./data/embeddings/` directory, containing the embeddings for each section. ## Output Structure The output is a JSON file where each section has its embedding added: ```json { "1": { "content": "First section text", "embeddings": [0.123, 0.456, ...] }, ... } ``` ## Notes - Make sure the folder `./data/embeddings/` exists before running the script. - The script supports Persian language.