orgcatorg/xlm-v-base-ner ################################################## ################################################## 2024-07-18 16:35:12.660757: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-07-18 16:35:15,455 Reading data from data 2024-07-18 16:35:15,456 Train: data/peyma_train.txt 2024-07-18 16:35:15,456 Dev: None 2024-07-18 16:35:15,456 Test: None 2024-07-18 16:35:17,860 No test split found. Using 0% (i.e. 803 samples) of the train split as test data 2024-07-18 16:35:17,865 No dev split found. Using 0% (i.e. 722 samples) of the train split as dev data 2024-07-18 16:35:17,865 Computing label dictionary. Progress: 0it [00:00, ?it/s] 1it [00:00, 2262.30it/s] 0it [00:00, ?it/s] 3503it [00:00, 35026.69it/s] 6503it [00:00, 35617.87it/s] 2024-07-18 16:35:18,051 Dictionary created for label 'ner' with 1072 values: O (seen 185595 times), های|O (seen 2277 times), ها|O (seen 1045 times), ای|O (seen 611 times), شود|O (seen 515 times), اند|O (seen 277 times), کند|O (seen 273 times), کنند|O (seen 183 times), هایی|O (seen 152 times), تواند|O (seen 124 times), ترین|O (seen 105 times), گذاری|O (seen 100 times), دهد|O (seen 100 times), جمله|O (seen 95 times), طور|O (seen 90 times), که|O (seen 87 times), تر|O (seen 82 times), شوند|O (seen 80 times), کنیم|O (seen 69 times), توان|O (seen 68 times) model read successfully ! ################################################## ################################################## 2024-07-18 16:35:22,095 SequenceTagger predicts: Dictionary with 1072 tags: O, های|O, ها|O, ای|O, شود|O, اند|O, کند|O, کنند|O, هایی|O, تواند|O, ترین|O, گذاری|O, دهد|O, جمله|O, طور|O, که|O, تر|O, شوند|O, کنیم|O, توان|O, نام|O, رود|O, المللی|O, الله|O, سازی|O, کننده|O, گیری|O, گیرد|O, ی|O, وگو|O, توانند|O, ایم|O, ماه|I_DAT, دهند|O, کنم|O, اش|O, و, ریزی|O, های|I_ORG, رسد|O, زیست|O, شد|O, نامه|O, گوید|O, بینی|O, شان|O, از|O, خاطر|O, را|O, رسانی|O 2024-07-18 16:35:22,107 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,108 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): XLMRobertaModel( (embeddings): XLMRobertaEmbeddings( (word_embeddings): Embedding(901630, 768) (position_embeddings): Embedding(514, 768, padding_idx=1) (token_type_embeddings): Embedding(1, 768) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): XLMRobertaEncoder( (layer): ModuleList( (0-11): 12 x XLMRobertaLayer( (attention): XLMRobertaAttention( (self): XLMRobertaSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): XLMRobertaSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): XLMRobertaIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): XLMRobertaOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): XLMRobertaPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=1072, bias=True) (loss_function): CrossEntropyLoss() )" 2024-07-18 16:35:22,108 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,108 Corpus: 6503 train + 722 dev + 803 test sentences 2024-07-18 16:35:22,108 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,108 Train: 6503 sentences 2024-07-18 16:35:22,108 (train_with_dev=False, train_with_test=False) 2024-07-18 16:35:22,108 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,108 Training Params: 2024-07-18 16:35:22,108 - learning_rate: "4e-05" 2024-07-18 16:35:22,108 - mini_batch_size: "10" 2024-07-18 16:35:22,108 - max_epochs: "200" 2024-07-18 16:35:22,109 - shuffle: "True" 2024-07-18 16:35:22,109 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,109 Plugins: 2024-07-18 16:35:22,109 - LinearScheduler | warmup_fraction: '0.1' 2024-07-18 16:35:22,109 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,109 Final evaluation on model after last epoch (final-model.pt) 2024-07-18 16:35:22,109 - metric: "('micro avg', 'f1-score')" 2024-07-18 16:35:22,109 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,109 Computation: 2024-07-18 16:35:22,109 - compute on device: cuda:0 2024-07-18 16:35:22,109 - embedding storage: none 2024-07-18 16:35:22,109 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,109 Model training base path: "taggers" 2024-07-18 16:35:22,109 ---------------------------------------------------------------------------------------------------- 2024-07-18 16:35:22,109 ---------------------------------------------------------------------------------------------------- 333333333 The expanded size of the tensor (573) must match the existing size (514) at non-singleton dimension 1. Target sizes: [10, 573]. Tensor sizes: [1, 514]