Summon is a utility to enable researchers to extract data from one or more CML files and collate the results as a comma-separated list of data, one line per file.
Summon is installed as part of Golem. If you have installed Golem using the make approach outlined earlier, you’ll find summon in /usr/local/bin; however, if you have used setup.py or easy_install, you may need to add summon to your path.
For example, on OS X:
$ which summon
/opt/local/Library/Frameworks/Python.framework/Versions/2.4/bin/summon
Similarly, on Windows and Unix machines, the summon script will be installed in your site-wide Python scripts directory (the same location where easy_install is found) - typically C:\Python25\Scripts
Windows and Unix have different approaches to deciding what programs should be executable; this makes it difficult to install the utilities which ship with Golem as executables out-of-the-box. So, in the following examples, assuming you’ve added your Python install directory to your path, please substitute:
c:\mycmldata\> python c:\Python25\Scripts\summon
(again, assuming Python is installed in c:\Python25\; if it’s installed somewhere else, substitute that path in too) for:
$ summon
when running the examples.
Summon comes with a built-in help message:
$ summon --help usage: summon options file1.xml [file2.xml ...] options: --version show program's version number and exit -h, --help show this help message and exit -t TERM, --term=TERM terms to look up -d DICTIONARY, --dictionary=DICTIONARY dictionary to use -c CONFIG, --config=CONFIG config file to use -f, --final take only last value in file? -o OUTFILE, --outfile=OUTFILE dump output to csv file
To explain how summon works, we start by taking an example.
$ summon -t numbers_of_species -d rmcprofileDict.xml ag3cocn6_300k.xml
Here, we’re extracting the numbers of the different atomic species in a simulation (ag3cocn6_300k.xml), which is represented by the term number_of_species in the CML/Golem dictionary rmcprofileDict.xml. We cover the development of dictionaries later, but for now it’s sufficient to know that a dictionary contains a list of terms and metadata on how to locate and manipulate the data the terms refer to.
The result of this call will look something like:
numbers_of_species
"[864, 288, 1728, 1728]"
where the line reflects the parameter name, and the 4 numbers are the counts of the numbers of atoms of each of the four types in the simulation file (from the name of the file you can guess that these are Ag, Co, C and N). To extract multiple quantities at once, specify each separately using -t TERM, where TERM is the id of the concept in the dictionary or Summon configuration file; quantities from the same file will be written to the same line in the output.
You can pass multiple XML files on the command line, in which case you get one line of data per file. However, by itself, there’s no means to tell which line corresponds with each file, and thus if you need this information, you will need to ensure that you capture data that provides this unambiguous link. Suppose we performed a large number of simulations using the OSSIA code, where each simulation corresponded to a different temperature. If we want to extract the value of a quantity called energy from each file, to plot it versus temperature (as defined in ossiaDict.xml), and assuming the CML outputted by OSSIA is in the current working directory, we would use the following summon command containing multiple instances of the -t option:
$ summon -t temperature -t energy -d /PATH/TO/DICTIONARIES/ossiaDict.xml *.xml
As you can see, in practice we usually don’t need file names; just to know that temperatures and energies match up. For instance, in this case we do know that both temperature and energy for any simulation, which will be given on the same line, will come from the same file.
Any term in a dictionary with a defined mapping from CML to a Golem object can be extracted in this manner - basically, any entry that directly contains, or is, a <scalar>, <array>, <matrix>, <lattice>, <atomArray>, <metadata> or <cellParameter>. Concepts where this mapping is not defined will raise an error. Looking in the dictionary, you can spot these because they do not contain lines like:
<golem:template call="scalar" role="getvalue" binding="pygolem_serialization "/>
To route output to a file as well as to the console, use the -o (or --output=) option:
$ summon -t temperature -t energy -d ossiaDict.xml -o output.csv *.xml
This produces a CSV file, output.csv, ready to import into your favourite spreadsheet.
Some of the time, the concepts you wish to extract may not be directly referenced within the CML/Golem dictionary for your code. This could be for a number of reasons; maybe the concept is too specific to be incorporated in the dictionary, such as a specific bond length in a system (which would only exist in systems of that kind), or possibly markup for this concept has been introduced recently into your code and the dictionary hasn’t been updated to match yet. In that case, you may need to write a Summon configuration file in order to define those concepts. Let’s say that the number_of_species concept, used earlier, is missing from rmcprofileDict.xml; we therefore need to supply a definition.
We do that by adding entries to a configuration file (say, rmcprofile.cfg):
[numbers_of_species]
type: array
xpath: //cml:parameterList/cml:parameter[@dictRef="rmcprofile:numbers_of_species"]
where the name of the concept is the first line ([numbers_of_species]), the type of the data therein is array (taken from the list of types Golem, and therefore summon, understands, as given earlier), and the XPath expression which points to the bit of CML we want to extract is xpath. (The CML namespace is defined to always be http://www.xml-cml.org/schema, so you don’t need to worry about declaring that). The fragment of CML that this refers to is:
<parameterList>
<parameter dictRef="rmcprofile:numbers_of_species" name="Numbers of each species">
<array size="4" dataType="xsd:integer" units="cmlUnits:countable">864 288 1728 1728</array>
</parameter>
</parameterList>
Thus, you can easily (effectively) define extra terms when you need them. If you wish to rename or inherit terms from elsewhere - say, a dictionary that came with Golem - then that is also possible. Again, using numbers_of_species as an example, and assuming the rmcprofile dictionary is available, you can define the term as follows.
[numbers_of_species]
dictRef: {http://www.esc.cam.ac.uk/rmcprofile}numbers_of_species
In general, dictRef consists of the namespace of the dictionary you wish to use and the id of the term within that dictionary you want. Alternatively, you can import entire dictionaries into your configuration with the following declaration:
[global]
dictionary: /path/to/dictionary
This makes every term in the dictionary available in this configuration file; in other words, it is exactly equivalent to entering into your configuration file
[term]
dictRef: {http://dictionary.namespace/}term
for every entry in the dictionary. However, if you explicitly define a term in the config file, that definition will be used even if it has been loaded from a dictionary already. If a Summon configuration file contains multiple [global] sections, and some of those dictionaries contain the same term, then the first-loaded dictionary wins.
Summon configuration files are used in the same way as dictionaries, except that you load them with the -c command line option, not -d. For example, using rmcprofile.cfg from earlier:
$ summon -t numbers_of_species -c rmcprofile.cfg ag3cocn6_300k.xml
will generate the same output as the dictionary-using approach given earlier.
In summary, Summon configuration files are a way of constructing new CML/Golem dictionaries on the fly as you need them, without going to all the effort of writing the XML by hand.