graph building by llm code and test results

This commit is contained in:
mmpouya 2025-03-16 23:33:12 +03:30
parent 6a06318a85
commit 7216d57425
6 changed files with 25364 additions and 0 deletions

View File

@ -0,0 +1,156 @@
from g4f.client import Client
import json
import time
import re
def remove_think_tags(strings):
if isinstance(strings, str):
return re.sub(r'<think>[\s\S]*?</think>', '', strings)
else:
return [re.sub(r'<think>[\s\S]*?</think>', '', s) for s in strings]
system_prompt_graph_builder_v2 = """You are tasked with converting a short legal text into an OWL model. The text is in natural language and may not explicitly describe entities. Your goal is to create a fine-grained graph representation where each noun and verb is represented as a node, and relationships between them are clearly defined.
follow these steps:
Read the Text: Carefully analyze the input legal text to understand its meaning and context.
Identify Nouns and Verbs: Extract all nouns and verbs from the text. For each identified noun and verb, create a corresponding node in the OWL model.
Establish Relationships: Determine the relationships between the nouns and verbs. For example, if a noun is the subject of a verb, create a relationship that reflects this connection. Note: Do not model nouns and verbs as object properties. Instead, treat them as distinct entities in the graph.
Clear annotations (e.g., using rdfs:label and rdfs:comment) with Persian labels and descriptions for enhanced clarity.
Construct the OWL Model: Using the identified nodes and relationships, construct the OWL model. Ensure that the model is structured in a way that allows for easy querying and understanding of the legal context.
Output the OWL Representation: Present the final OWL model in TTL (Turtle) serialization format, ensuring that all nodes and relationships are properly defined."""
system_prompt_graph_builder_v3 = """You are a legal expert as well as a specialist in OWL modeling. Your task is to produce a thorough and precise OWL model for every sentence provided. For each concept or identity, follow these rules:
Select an appropriate English identifier and assign it a Persian label.
For terms that describe an event, model them as named individuals belonging to a relevant class that is a subclass of go:event.
For terms that describe an attribute or quality, model them as named individuals belonging to a relevant class that is a subclass of go:quality.
For all other identitiesthose representing substancesmodel them as named individuals in a relevant class that is a subclass of go:substance.
Construct predicates to capture the relationships between events, substances, and qualities that reflect both their semantic and syntactic connections.
Finally, summarize the complete model output in Turtle format."""
system_prompt_graph_builder_v3_1 = """You are a lawyer, a legal expert, and an authority on OWL modeling. Your task is to provide a detailed and accurate OWL model for each sentence provided to you. The modeling must conform to the following guidelines:
For every entity mentioned:
Determine whether the entity is specific or undetermined.
If the entity is specific, treat it as a named individual and assign an identifier that begins with the prefix "go" (e.g., goEntityName). Label it additionally with a corresponding Persian word.
If the entity is undetermined, use the prefix "ex" (e.g., exEntityName) for its identifier and label it with a related Persian term.
Every entity, regardless of type, must belong as an instance of its corresponding class.
Categorize words based on their nature:
If a word represents an event, model it as a class that is a subclass of go:event.
If a word represents an attribute or quality, model it as a class that is a subclass of go:quality.
For all other entities (substances), model them as classes that are a subclass of go:substance.
When assigning identifiers to the entities, use a singular form of the word (avoid plurals).
Define relationships:
For events, substances, and qualities, design predicates (object properties) to capture the semantic and syntactic relationships accurately between them. The relationship names should be descriptive enough to reflect the intended interaction between the entities.
Output:
Write the complete OWL model in Turtle serialization.
Ensure that every part of your model (i.e., classes, individuals, and predicates) is correctly formatted for use in an OWL ontology.
Your response should include both an explanation of the modeling choices (if needed) and the Turtle code representing the ontology."""
system_prompt_graph_builder_V3_2 = """You are an expert lawyer specializing in law and OWL modeling. Your task is to provide a detailed and accurate OWL representation for each given sentence. Follow these instructions step-by-step:
Determine whether each sentence states:
A fact.
A conditional sentence expressing a hypothetical, uncertain scenario.
For factual sentences:
Create an OWL representation using named individuals.
Ensure each individual is declared as an instance of an appropriate class.
For conditional or hypothetical sentences:
Create an OWL representation that models the hypothetical scenario.
Prefix all related OWL expressions with "ex:" to denote their hypothetical status.
Entity Representation:
For each entity mentioned, select an appropriate English identifier and provide a corresponding Persian label.
When the entity represents an event, define it as a class that is a subclass of the class go:event.
When the entity represents an attribute or quality, define it as a class that is a subclass of the class go:quality.
For all remaining entities (i.e., substances), define them as classes that are subclasses of the class go:substance.
Use only singular words for all class/individual identifiers.
Relationships:
Design OWL predicates to capture the semantic and syntactic relationships between events, substances, and qualities as expressed in the sentences.
Output:
Write the complete OWL model in Turtle syntax.
"""
system_prompt_graph_builder = """You are given a sentence in Persian that describes a real-world event, relationship, or state. Your task is to convert the sentence into a series of RDF triples based on an underlying ontology. Follow these general guidelines and consider both syntactic and semantic relationships:
Entity Identification:
Recognize and extract key entities (individuals, organizations, objects, etc.) and model them as substances. Prefix these identifiers with 'go:' and denote them with rdf:type go:substance.
For example, 'وزارت نیرو' might be mapped to go:ministryOfPowerAndEnergy.
Event and Action Representation:
Identify verbs and action phrases that describe events or states (e.g., establishing, protecting, violating, warning).
Define these actions as RDF event classes using prefix 'go:' (e.g., go:establish, go:protect) and represent specific occurrences with instance identifiers using the prefix 'ex:'.
Where necessary, denote events using additional classifications such as go:Obligatory to capture duties, mandates, or permissions.
RDF Properties and Thematic Roles:
Use RDF properties to express relationships between entities and events. Common properties include but are not limited to: go:HasAgent indicating the actor or initiator of an action. go:HasTheme / go:Hastheme indicating the affected entity or object of the event. go:HasRecipient indicating the target or beneficiary of actions. go:HasTime for temporal details, such as a specific year. go:TriggeredBy specifying causation between events.
Adapt thematic roles to the specific verb or action in the sentence. The roles may vary based on the verbs semantics (for example, roles for verbs like 'warn', 'establish', or 'violate' may differ).
Additionally, when mapping relationships such as between a substance and its attributes, consider designing intermediary nodes (with the provided prefixes) that capture more complex syntactic and semantic relationships.
Consistent Naming and Ontology Mapping:
For all new entities, event types, and properties, maintain consistency with the naming conventions (using the 'go:' and 'ex:' prefixes).
Use subclass relationships where applicable (e.g., rdfs:subClassOf go:event for event types, rdfs:subClassOf go:substance for broad classes of entities).
these are some examples:
Input:
وزارت نیرو در 1402 یک نیروگاه برق تأسیس کرد
Output:
go:ministryOfPowerAndEnergy rdf:type go:substance.
go:establish rdfs:subClassOf go:event.
go:PowerPlant rdf:type go:substance.
ex:establish1 rdf:type go:establish.
ex:PowerPlant1 rdf:type go:PowerPlant.
ex:establish1 go:HasAgent go:ministryOfPowerAndEnergy.
ex:establish1 go:HasTheme ex:PowerPlant1.
ex:establish1 go:HasTime go:year1402.
Input:
سازمان محیط زیست وظیفه حفظ محیط زیست را بر عهده دارد
Output:
go:DepartmentOfEnvironment rdf:type go:substance.
go:protect rdfs:subClassOf go:event.
go:Obligatory rdfs:subClassOf go:event.
go:Environment rdf:type go:substance.
ex:protect1 rdf:type go:protect.
ex:environment1 rdf:type go:Environment.
ex:protect1 go:HasAgent go:DepartmentOfEnvironment.
ex:protect1 go:Hastheme ex:environment1.
ex:protect1 rdf:type go:Obligatory
go:DepartmentOfEnvironment go:HasDuty ex:protect1.
"""
simple = ""
total_time_list = []
with open('API/mj_qa_section1_simplified.json', 'r', encoding='utf-8') as file:
data = json.load(file)
for index, item in enumerate(data):
# while index < 15:
# pass
start_time = time.time()
simple = item.get("simplified")
client = Client()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": system_prompt_graph_builder},
{"role": "user", "content": simple}],
temperature=0.2,
web_search=False
)
max_retries = 10 # Maximum number of retries
retry_count = 0
x = None
while not x:
try:
x = response.choices[0].message.content
except:
retry_count += 1
print("Oops! need sleep. let's have another try.")
time.sleep(2)
start_time = time.time()
item["OWL"] = x
data[index] = item
end_time = time.time()
total_time = end_time-start_time
print(f"elapsed time: {total_time}")
total_time_list.append(total_time)
ave = sum(total_time_list) / len(total_time_list)
print(f"average time: {ave}")
i = index+1
print(f"Result for item {item['id']}, it is the {i}th one")
with open('API/results.json', 'w', encoding='utf-8') as file:
json.dump(data, file, indent=4, ensure_ascii=False)
print(f"Results saved to 'results.json'. {i}th OWL added.")
time.sleep(1)

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long