orgcatorg/xlm-v-base-ner
##################################################
##################################################
2024-07-18 16:35:12.660757: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-18 16:35:15,455 Reading data from data
2024-07-18 16:35:15,456 Train: data/peyma_train.txt
2024-07-18 16:35:15,456 Dev: None
2024-07-18 16:35:15,456 Test: None
2024-07-18 16:35:17,860 No test split found. Using 0% (i.e. 803 samples) of the train split as test data
2024-07-18 16:35:17,865 No dev split found. Using 0% (i.e. 722 samples) of the train split as dev data
2024-07-18 16:35:17,865 Computing label dictionary. Progress:
0it [00:00, ?it/s]1it [00:00, 2262.30it/s]
0it [00:00, ?it/s]3503it [00:00, 35026.69it/s]6503it [00:00, 35617.87it/s]
2024-07-18 16:35:18,051 Dictionary created for label 'ner' with 1072 values: O (seen 185595 times), های|O (seen 2277 times), ها|O (seen 1045 times), ای|O (seen 611 times), شود|O (seen 515 times), اند|O (seen 277 times), کند|O (seen 273 times), کنند|O (seen 183 times), هایی|O (seen 152 times), تواند|O (seen 124 times), ترین|O (seen 105 times), گذاری|O (seen 100 times), دهد|O (seen 100 times), جمله|O (seen 95 times), طور|O (seen 90 times), که|O (seen 87 times), تر|O (seen 82 times), شوند|O (seen 80 times), کنیم|O (seen 69 times), توان|O (seen 68 times)
model read successfully !
##################################################
##################################################
2024-07-18 16:35:22,095 SequenceTagger predicts: Dictionary with 1072 tags: O, های|O, ها|O, ای|O, شود|O, اند|O, کند|O, کنند|O, هایی|O, تواند|O, ترین|O, گذاری|O, دهد|O, جمله|O, طور|O, که|O, تر|O, شوند|O, کنیم|O, توان|O, نام|O, رود|O, المللی|O, الله|O, سازی|O, کننده|O, گیری|O, گیرد|O, ی|O, وگو|O, توانند|O, ایم|O, ماه|I_DAT, دهند|O, کنم|O, اش|O, و, ریزی|O, های|I_ORG, رسد|O, زیست|O, شد|O, نامه|O, گوید|O, بینی|O, شان|O, از|O, خاطر|O, را|O, رسانی|O
2024-07-18 16:35:22,107 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,108 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): XLMRobertaModel(
      (embeddings): XLMRobertaEmbeddings(
        (word_embeddings): Embedding(901630, 768)
        (position_embeddings): Embedding(514, 768, padding_idx=1)
        (token_type_embeddings): Embedding(1, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): XLMRobertaEncoder(
        (layer): ModuleList(
          (0-11): 12 x XLMRobertaLayer(
            (attention): XLMRobertaAttention(
              (self): XLMRobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): XLMRobertaSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): XLMRobertaIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): XLMRobertaOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): XLMRobertaPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=1072, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2024-07-18 16:35:22,108 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,108 Corpus: 6503 train + 722 dev + 803 test sentences
2024-07-18 16:35:22,108 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,108 Train:  6503 sentences
2024-07-18 16:35:22,108         (train_with_dev=False, train_with_test=False)
2024-07-18 16:35:22,108 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,108 Training Params:
2024-07-18 16:35:22,108  - learning_rate: "4e-05" 
2024-07-18 16:35:22,108  - mini_batch_size: "10"
2024-07-18 16:35:22,108  - max_epochs: "200"
2024-07-18 16:35:22,109  - shuffle: "True"
2024-07-18 16:35:22,109 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,109 Plugins:
2024-07-18 16:35:22,109  - LinearScheduler | warmup_fraction: '0.1'
2024-07-18 16:35:22,109 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,109 Final evaluation on model after last epoch (final-model.pt)
2024-07-18 16:35:22,109  - metric: "('micro avg', 'f1-score')"
2024-07-18 16:35:22,109 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,109 Computation:
2024-07-18 16:35:22,109  - compute on device: cuda:0
2024-07-18 16:35:22,109  - embedding storage: none
2024-07-18 16:35:22,109 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,109 Model training base path: "taggers"
2024-07-18 16:35:22,109 ----------------------------------------------------------------------------------------------------
2024-07-18 16:35:22,109 ----------------------------------------------------------------------------------------------------
333333333
The expanded size of the tensor (573) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [10, 573].  Tensor sizes: [1, 514]