此问题与rcreswick关于序列化Jena OntModel更改。我在两台(或多台)机器上有Jena模型,它们需要通过套接字保持同步。我需要解决的主要问题是,模型可能包含匿名节点(bnode),这些节点可以源自任何模型。
问题:我在这方面走对了吗?还是有一种更好、更稳健的方法我没有考虑?
我可以想出三种解决这个问题的方法:
- Serialize the complete model: This is prohibitively expensive for synchronizing small updates. Also, since changes can occur on either machine, I can t just replace machine B s model with the serialized model from machine A. I need to merge them.
- Serialize a partial model: Use a dedicated model for serialization that only contains the changes that need to be sent over the socket. This approach requires special vocabulary to represent statements that were removed from the model. Presumably, when I serialize the model from machine A to machine B, anonymous node IDs will be unique to machine A but may overlap with IDs for anonymous nodes created on machine B. Therefore, I ll have to rename anonymous nodes and keep a mapping from machine A s anon ids to machine B s ids in order to handle future changes correctly.
- Serialize individual statements: This approach requires no special vocabulary, but may not be as robust. Are there issues other than anonymous nodes that I just haven t encountered yet?
- Generate globally unique bnode ids (NEW): We can generate globally unique IDs for anonymous nodes by prefixing the ID with a unique machine ID. Unfortunately, I haven t figured out how to tell Jena to use my ID generator instead of its own. This would allow us to serialize individual statements without remapping bnode IDs.
这里有一个例子来进一步巩固这个讨论。假设我在机器a上有一个列表,表示为:
_:a rdf:first myns:tom
_:a rdf:rest rdf:nil
我将此模型从机器A序列化到机器B。现在,因为机器B可能已经有一个id为A的(不相关的)匿名节点,所以我将id A重新映射到一个新的id B:
_:b rdf:first myns:tom
_:b rdf:rest rdf:nil
现在机器A上的列表发生了变化:
_:a rdf:first myns:tom
_:a rdf:rest _:b
_:b rdf:first myns:dick
_:b rdf:rest rdf:nil
由于机器B以前从未遇到过机器A的id B,它添加了一个从机器A的idb到新id c的新映射:
_:b rdf:first myns:tom
_:b rdf:rest _:c
_:c rdf:first myns:dick
_:c rdf:rest rdf:nil
有两台以上的机器,这个问题就更复杂了。例如,如果有第三台机器C,它可能有自己的匿名节点a,这与机器a的匿名节点a不同。因此,机器B确实需要保持从其他机器中的每一个匿名节点ID到其本地ID的映射,而不仅仅是从一般的远程ID到本地ID。在处理传入的更改时,必须考虑更改的来源,以便正确映射ID。