The Golem ontology parsing library.
This module contains the main class which parses Golem/CML dictionaries, as defined by the CML and Golem schemata, and allows you to use them to extract and convert information found in CML datafiles.
Main class for representing CML/Golem dictionaries.
Example of usage:
>>> from StringIO import StringIO
>>> dictionarystring = """<?xml version="1.0"?>
... <dictionary
... namespace="http://www.materialsgrid.org/castep/dictionary"
... dictionaryPrefix="castep"
... title="CASTEP Dictionary"
... xmlns="http://www.xml-cml.org/schema"
... xmlns:h="http://www.w3.org/1999/xhtml/"
... xmlns:cml="http://www.xml-cml.org/schema"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema"
... xmlns:golem="http://www.lexical.org.uk/golem"
... xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
... <entry id="xcFunctional" term="Exchange-Correlation Functional">
... <annotation />
... <definition>
... The exchange-correlation functional used.
... </definition>
... <description>
... <h:div class="dictDescription">
... Available values for this are:
... <h:ul>
... <h:li>
... <h:strong>LDA</h:strong>
... , the Local Density Approximation
... </h:li>
... <h:li>
... <h:strong>PW91</h:strong>
... , Perdew and Wang's 1991 formulation
... </h:li>
... <h:li>
... <h:strong>PBE</h:strong>
... Perdew, Burke and Enzerhof's original GGA
... functional
... </h:li>
... <h:li>
... <h:strong>RPBE</h:strong>
... , Hammer et al's revised PBE functional
... </h:li>
... </h:ul>
... </h:div>
... </description>
...
... <metadataList>
... <metadata name="dc:author" content="golem-kiln" />
... </metadataList>
... <golem:xpath>/cml:cml/cml:parameterList[@dictRef="input"]/cml:parameter[@dictRef="castep:xcFunctional"]</golem:xpath>
... <golem:template call="scalar" role="getvalue" binding="pygolem_serialization" />
... <golem:template role="arb_to_input" binding="input" input="external">
... <xsl:stylesheet version='1.0'
... xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
... xmlns:cml='http://www.xml-cml.org/schema'>
... <xsl:strip-space elements="*" />
... <xsl:output method="text" />
... <xsl:param name="p1" />
... <xsl:template match="/">
... <xsl:text>XC_FUNCTIONAL </xsl:text><xsl:value-of select="$p1" />
... </xsl:template>
... </xsl:stylesheet>
... </golem:template>
... <golem:implements>convertibleToInput</golem:implements>
... <golem:implements>value</golem:implements>
... <golem:implements>absolute</golem:implements>
... <golem:childOf>input</golem:childOf>
...
... <golem:possibleValues type="string">
... <golem:enumeration>
... <golem:value>LDA</golem:value>
... <golem:value>PW91</golem:value>
... <golem:value>PBE</golem:value>
... <golem:value>RPBE</golem:value>
... <golem:value>HF</golem:value>
... <golem:value>SHF</golem:value>
... <golem:value>EXX</golem:value>
... <golem:value>SX</golem:value>
... <golem:value>ZERO</golem:value>
... <golem:value>HF-LDA</golem:value>
... <golem:value>SHF-LDA</golem:value>
... <golem:value>EXX-LDA</golem:value>
... <golem:value>SX-LDA</golem:value>
... </golem:enumeration>
... </golem:possibleValues>
... </entry>
...
... <entry id="scalar" term="Scalar default call">
... <annotation />
... <definition />
... <description />
... <metadataList />
... <golem:template role="getvalue" binding="pygolem_serialization">
... <xsl:stylesheet version='1.0'
... xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
... xmlns:cml='http://www.xml-cml.org/schema'
... xmlns:str="http://exslt.org/strings"
... xmlns:func="http://exslt.org/functions"
... xmlns:exsl="http://exslt.org/common"
... xmlns:tohw="http://www.uszla.me.uk/xsl/1.0/functions"
... extension-element-prefixes="func exsl tohw str"
... exclude-result-prefixes="exsl func tohw xsl str">
... <xsl:output method="text" />
...
...
... <func:function name="tohw:isAListOfDigits">
... <!-- look only for [0-9]+ -->
... <xsl:param name="x_"/>
... <xsl:variable name="x" select="normalize-space($x_)"/>
... <xsl:choose>
... <xsl:when test="string-length($x)=0">
... <func:result select="false()"/>
... </xsl:when>
... <xsl:when test="substring($x, 1, 1)='0' or
... substring($x, 1, 1)='1' or
... substring($x, 1, 1)='2' or
... substring($x, 1, 1)='3' or
... substring($x, 1, 1)='4' or
... substring($x, 1, 1)='5' or
... substring($x, 1, 1)='6' or
... substring($x, 1, 1)='7' or
... substring($x, 1, 1)='8' or
... substring($x, 1, 1)='9'">
... <xsl:choose>
... <xsl:when test="string-length($x)=1">
... <func:result select="true()"/>
... </xsl:when>
... <xsl:otherwise>
... <func:result select="tohw:isAListOfDigits(substring($x, 2))"/>
... </xsl:otherwise>
... </xsl:choose>
... </xsl:when>
... <xsl:otherwise>
... <func:result select="false()"/>
... </xsl:otherwise>
... </xsl:choose>
... </func:function>
...
... <func:function name="tohw:isAnInteger">
... <!-- numbers fitting [\+-][0-9]+ -->
... <xsl:param name="x_"/>
... <xsl:variable name="x" select="normalize-space($x_)"/>
... <xsl:variable name="try">
... <xsl:choose>
... <xsl:when test="starts-with($x, '+')">
... <xsl:value-of select="substring($x,2)"/>
... </xsl:when>
... <xsl:when test="starts-with($x, '-')">
... <xsl:value-of select="substring($x,2)"/>
... </xsl:when>
... <xsl:otherwise>
... <xsl:value-of select="$x"/>
... </xsl:otherwise>
... </xsl:choose>
... </xsl:variable>
... <func:result select="tohw:isAListOfDigits($try)"/>
... </func:function>
...
... <func:function name="tohw:isANumberWithoutExponent">
... <!-- numbers fitting [\+-][0-9]+(\.[0-9]*) -->
... <xsl:param name="x"/>
... <xsl:choose>
... <xsl:when test="contains($x, '.')">
... <func:result select="tohw:isAnInteger(substring-before($x, '.')) and
... tohw:isAListOfDigits(substring-after($x, '.'))"/>
... </xsl:when>
... <xsl:otherwise>
... <func:result select="tohw:isAnInteger($x)"/>
... </xsl:otherwise>
... </xsl:choose>
... </func:function>
...
... <func:function name="tohw:isAnFPNumber">
... <!-- Try and interpret a string as an exponential number -->
... <!-- should only recognise strings of the form: [\+-][0-9]*\.[0-9]*([DdEe][+-][0-9]+)? -->
... <xsl:param name="x"/>
... <xsl:choose>
... <xsl:when test="contains($x, 'd')">
... <func:result select="tohw:isANumberWithoutExponent(substring-before($x, 'd')) and
... tohw:isAnInteger(substring-after($x, 'd'))"/>
... </xsl:when>
... <xsl:when test="contains($x, 'D')">
... <func:result select="tohw:isANumberWithoutExponent(substring-before($x, 'D')) and
... tohw:isAnInteger(substring-after($x, 'D'))"/>
... </xsl:when>
... <xsl:when test="contains($x, 'e')">
... <func:result select="tohw:isANumberWithoutExponent(substring-before($x, 'e')) and
... tohw:isAnInteger(substring-after($x, 'e'))"/>
... </xsl:when>
... <xsl:when test="contains($x, 'E')">
... <func:result select="tohw:isANumberWithoutExponent(substring-before($x, 'E')) and
... tohw:isAnInteger(substring-after($x, 'E'))"/>
... </xsl:when>
... <xsl:otherwise>
... <func:result select="tohw:isANumberWithoutExponent($x)"/>
... </xsl:otherwise>
... </xsl:choose>
... </func:function>
...
... <xsl:template match="/">
... <xsl:apply-templates />
... </xsl:template>
...
... <xsl:template match="cml:scalar">
... <xsl:variable name="value">
... <xsl:choose>
... <xsl:when test="tohw:isAnFPNumber(.)">
... <xsl:value-of select="." />
... </xsl:when>
... <xsl:otherwise>
... <xsl:text>"</xsl:text><xsl:value-of select="." /><xsl:text>"</xsl:text>
... </xsl:otherwise>
... </xsl:choose>
... </xsl:variable>
... <xsl:variable name="units">
... <xsl:choose>
... <xsl:when test="@units">
... <xsl:text>"</xsl:text><xsl:value-of select="@units" /><xsl:text>"</xsl:text>
... </xsl:when>
... <xsl:otherwise>
... <xsl:text>""</xsl:text>
... </xsl:otherwise>
... </xsl:choose>
... </xsl:variable>
... <xsl:text>[</xsl:text><xsl:value-of select="$value"/><xsl:text>,</xsl:text><xsl:value-of select="$units" /><xsl:text>]</xsl:text>
... </xsl:template>
... </xsl:stylesheet>
... </golem:template>
...
... <golem:template role="defaultoutput">
... <xsl:stylesheet version='1.0'
... xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
... xmlns:cml='http://www.xml-cml.org/schema'
... xmlns:str="http://exslt.org/strings"
... extension-element-prefixes="str"
... >
... <xsl:output method="text" />
... <xsl:param name="name" />
... <xsl:param name="value" />
... <xsl:template match="/">
... <xsl:value-of select='$name' /><xsl:value-of select='$value' />
... </xsl:template>
... </xsl:stylesheet>
... </golem:template>
... <golem:seealso>gwtsystem</golem:seealso>
... </entry>
... </dictionary>
... """
>>> d = Dictionary(StringIO(dictionarystring))
>>> xcf = d["{http://www.materialsgrid.org/castep/dictionary}xcFunctional"]
>>> cmlstr = """<?xml version="1.0" encoding="UTF-8"?>
... <?xml-stylesheet href="display.xsl" type="text/xsl"?>
... <cml convention="FoX_wcml-2.0" fileId="NaCl_00GPa.xml" version="2.4"
... xmlns="http://www.xml-cml.org/schema"
... xmlns:castep="http://www.materialsgrid.org/castep/dictionary"
... xmlns:castepunits="http://www.materialsgrid.org/castep/units"
... xmlns:cml="http://www.xml-cml.org/dict/cmlDict"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema"
... xmlns:dc="http://purl.org/dc/elements/1.1/title"
... xmlns:units="http://www.uszla.me.uk/FoX/units"
... xmlns:atomicUnits="http://www.xml-cml.org/units/atomic">
... <metadataList title="Autocaptured metadata">
... <metadata name="dc:date" content="2007-02-09"/>
... </metadataList>
... <parameterList dictRef="input" convention="Input Parameters">
... <parameter dictRef="castep:xcFunctional"
... name="Exchange-Correlation Functional">
... <scalar dataType="xsd:string">PBE</scalar>
... </parameter>
... </parameterList>
... </cml>
... """
>>> tree = etree.parse(StringIO(cmlstr))
>>> xcfd = xcf.findin(tree)
>>> print len(xcfd)
1
>>> xcval = xcf.getvalue(xcfd[0])
>>> print xcf.getvalue(xcfd[0])
PBE
>>> # units are not defined on XCFunctional, so:
>>> print xcval.unit
golem:undefined
>>> # by convention
>>> print xcval.entry.definition
<BLANKLINE>
The exchange-correlation functional used.
<BLANKLINE>
The Entry class represents an entry in a Golem/CML dictionary.
Entries have the following structure:
<entry id="template" term="Template entry"> <annotation> <appinfo><!-- CML-specific machine-processable information --></appinfo> </annotation> <definition>Human-readable one-liner definition</definition> <description>Substantial human-readable documentation</description> <metadataList><!-- Dublin Core semantics --> <metadata name="dc:creator" content="Test Author" /> </metadataList> <golem:xpath></golem:xpath> <golem:template role="role" binding="binding"> <!-- and optionally "@input" --> </golem:template> <golem:possibleValues type="DATATYPE"> <golem:range> <golem:minimum>1</golem:minimum> <golem:maximum>100</golem:maximum> </golem:range> <!-- or --> <golem:enumeration> <golem:value>1</golem:value> <golem:value>2</golem:value> <golem:value>3</golem:value> </golem:enumeration> </golem:possibleValues> <!-- or matrix ... --> <golem:implements>otherEntry</golem:implements> <!-- times n --> <golem:synonym>synonymousEntry</golem:synonym> <!-- times n --> <golem:seealso>similarEntry</golem:seealso> <!-- times n --> <golem:childOf>parentEntry</golem:childOf> <!-- times n --> </entry>
Internal method (you’ll never call this directly); bounds-check a piece of data and template it into an associated <golem:template> defined in the dictionary. These are mapped onto Python methods named after the name of the <golem:template>.
In other words, this is where entry.to_value calls come from.
Map a matrix onto a dictionary for subsequent output using XSLT.
The algorithm used is:
Load a dictionary entry from its XML representation.
arguments: (etree for the entry, parent dictionary object).
Set asModel to true if you’re using this dictionary as a model for building a new one: it stashes way more of the native XML in that case, allowing you to serialize it out directly into your new dictionary. At present, this is only used by the dictionary generator (bin/make_dictionary.py in your Golem distribution.)
Set a predicate (condition) on a particular Entry instance.
This predicate will be honoured on subsequent calls to x.findin for entry x; it takes the form of an XPath function.
Load a dictionary from a default location on the filesystem.
On Windows, this is C:cmldictionariesand must be changed by editing golem.py by hand: on Unix, it defaults to ~/.cmldictionaries/ but can be overridden by setting the environment variable CMLDICTIONARIES.
Set whether warnings will be emitted when unit/type-bearing data is modified.
Default is True.
Set whether warnings will be emitted when a dictionary Entry without a defined type is used.
Default is True.
For a given Golem value x, with units x.unit and concept x.uri, from URI ‘resource’, produce an RDF/XML fragment of the form:
<rdf:Description rdf:about="resource"> <dictionary:uri rdf:about="resource#fragment"> <golem:value datatype="http://example.org/json/">JSON literal </golem:value> <golem:units>unit</golem:units> </dictionary:uri> </rdf:Description>
If you’re trying to build a program to interact with a large corpus of CML, this is a good place to start.
Arguments:
Run any attached queries or fits against this dataset and return the results.
graphfile, here, is the path for a Pelote plot, if you wish to make one; output is a file-like object to write output to, and format is the format in which to return results.
Generic representation of a collection of CML data with an attached Golem query (see register) and, optionally, one or more functions to be fitted to the result of that query (see addfunction).
For most practical purposes, you’re going to want to use caching_dataset, which inherits from this, instead, with the golem.db.fs driver (dbformat in the class signature.)
Attach a query (consisting of a pair of lists of Golem concepts, each producing one datapoint per file) to this dataset.
A dataset may only have one query (and therefore set of fits) attached at once; attaching a new query through register loses any previously attached fits or retrieved queries.
Concept extracted from a corpus of CML data.
When analysing a corpus of CML data in order to produce a dictionary, every dictRef encountered is mapped to an instance of this class; therefore, this class contains helper methods to output
State that this concept is relatively, not absolutely, positioned.
A relatively-positioned concept occurs in more than one location within the documents in this corpus (as distinguished by XPath expressions); an absolutely-positioned concept always occurs in the same place.
Calculate an XPath expression which identifies an XML node corresponding to the current concept.
This XPath expression ignores the document context in which the concept was found - ie, “//%%s” %% (xpath) would find all instances of this concept within a given XML document.
A collection of the concepts found in a corpus of documents we’re analysing.
Build a dictionary.
Arguments:
Write out a CML/Golem dictionary, using model as a source of definitions, descriptions and terms, containing definitions for the entries in concepts and the relationships between concepts defined in groupings.
prefix is the short name to be used in the namespace declaration; the namespace is ns, and the dictionary will be entitled title.
Represents a Pelote graph.
Serialize and write out a Pelote plot to a file or file-like object.
f can be either a string or a file-like object. If f is a string, it is taken to be the filename to use.