Introduction to CENSO
Installation
To install CENSO, simply clone the github repository. Then, you have two options:
install via
pip
,setup usage via
$PATH
.
To install in your current environment using pip
, run:
pip install .
CENSO can then be run by configuring a custom runner script (see below) or by calling
python3 -m censo
in the terminal.
Alternatively, after cloning the repository, you could also add the bin
folder to your $PATH
.
In this directory, there is a helper script called censo
, calling the command line entry point of CENSO
as if you were calling it using python3 -m censo
.
Usage via runner script
In order to call CENSO using a custom runner script, you can use the following example as a template.
from censo.ensembledata import EnsembleData
from censo.configuration import configure
from censo.ensembleopt import Screening, Optimization
from censo.properties import NMR
workdir = "/absolute/path/to/your/workdir" # CENSO will put all files in this directory
input_path = "rel/path/to/your/inputfile" # path relative to the working directory
ensemble = EnsembleData(workdir)
ensemble.read_input(input_path, charge=0, unpaired=0)
# If the user wants to use a specific rcfile:
configure(rcpath="/abs/path/to/rcfile")
# Setup all the parts that the user wants to run (don't need to be in a specific order)
parts = [
part(ensemble) for part in [Screening, Optimization, NMR]
]
# Run all the parts and collect their runtimes
part_timings = []
for part in parts:
part_timings.append(part.run())
# If no Exceptions were raised, all the output can now be found in 'workdir'
# Data is given in a formatted plain text format (*.out) and and json format
# The files used in the computations for each conformer can be found in the folders
# generated by each part, respectively (e.g. 'prescreening/CONF2/...')
Requirements
CENSO requires xTB in version 6.4.0 or above. In order to use ORCA, it should be installed in version 4.x or above. It is recommended to use CREST for initial ensemble generation, as well as for better interfacing with ANMR for ensemble NMR spectra calculation.
CENSO requires Python >= 3.10, there are no further dependencies. To use the uvvisplot
script
numpy
and pandas
are required.
General information
Commandline Energetic SOrting (CENSO) is a sorting algorithm for efficient evaluation of Structure Ensembles (SE).
CENSO can be structured into two components:
Ensemble optimization,
Ensemble property calculations.
The first part (ensemble optimization) can use up to four steps:
Prescreening,
Screening,
(Geometry-)Optimization,
Refinement.
In these steps, the ensemble is optimized, using increasingly accurate settings.
Ensemble properties available for calculation are:
NMR spectra,
UV/Vis spectra.
In the property calculation steps the ensemble is not further modified. However, they require at least one ensemble optimization step to be run beforehand for energy rankings and Boltzmann populations.
For now, all calculations can only be performed using the xTB and ORCA programs.
New features in CENSO 2.0.0
Template files
Since 2.0, CENSO supports template input files for all steps. They are located in $HOME/.censo2_assets
.
In order to use a template file for e.g. prescreening with ORCA, the file should be called prescreening.orca.template
.
It should contain two keywords: {main}
and {geom}
. These are later replaced by the main argument line and the geometry
block, respectively. All further settings you add are inserted at the respective positions you put them in the
template file.
Dummy functionals
Since only a limited amount of functionals are preconfigured in CENSO, the dummy
option exists as value
for func
. This tells CENSO to write no functional specific settings automatically into the input (such as
frozencore
for double-hybrids in ORCA). By combining this with a template file, it is possible to also use
functionals that are not defined as keywords in ORCA, such as e.g. revDSD-PBEP86-D4 (J. M. L. Martin et al., J Phys Chem A 2019
doi: 10.1021/acs.jpca.9b03157).
Running from a script
It is possible to run CENSO from a custom runner script. An example might look like this:
from censo.ensembledata import EnsembleData
from censo.configuration import configure
from censo.ensembleopt import Prescreening, Screening, Optimization
from censo.properties import NMR
workdir = "/absolute/path/to/your/workdir" # CENSO will put all files in this directory
input_path = "rel/path/to/your/inputfile" # path relative to the working directory
ensemble = EnsembleData(workdir)
ensemble.read_input(input_path, charge=0, unpaired=0)
# If the user wants to use a specific rcfile:
configure("/abs/path/to/rcfile")
# Get the number of available cpu cores on this machine
# This number can also be set to any other integer value and automatically checked for validity
ncores = os.cpu_count()
# Setup all the parts that the user wants to run
parts = [
part(ensemble) for part in [Prescreening, Screening, Optimization, NMR]
]
# The user can also choose to change specific settings of the parts
# Please take note of the following:
# - the settings of certain parts, e.g. Prescreening are changed using set_setting(name, value)
# - general settings are changed by using set_general_setting(name, value) (it does not matter which part you call it from)
# - the values you want to set must comply with limits and the type of the setting
Prescreening.set_setting("threshold", 5.0)
Prescreening.set_general_setting("solvent", "dmso")
# It is also possible to use a dict to set multiple values in one step
settings = {
"threshold": 3.5,
"func": "pbeh-3c",
"implicit": True,
}
Screening.set_settings(settings, complete=False)
# Running a part will return it's runtime in seconds
part_timings = []
for part in parts:
# Running the parts in order, while it is also possible to use a custom order or run some parts multiple times
# Note though, that currently this will lead to results being overwritten in your working directory and
# the ensembledata object
part_timings.append(part.run(ncores))
# You access the results using the ensemble object
# You can also find all the results the <part>.json output files
print(ensemble.conformers[0].results["prescreening"]["sp"]["energy"])
Ensemble Optimization
Prescreening
The first step after generating an ensemble of the most important conformers, e.g. using CREST,
the number of which can range in the hundreds, is to improve on the preliminary
ranking using a lightweight DFT method. This should usually already yield significant
improvements compared to the preliminary ranking, usually obtained using SQM/FF methods.
In the case that solvation effects should be included, CENSO will use xtb
to
calculate the energy of solvation using the ALPB or GBSA solvation model. The threshold
for this step should be rather high (up to 10 kcal/mol).
Screening
After prescreening the ensemble in the first step, this step is supposed to further
improve on the ranking quality by increasing the quality of the utilized DFT method.
Also, in this step one may choose to include thermal contributions to the free enthalpy
by activating evaluate_rrho
, which will lead to CENSO using xtb
to calculate
single-point Hessians. This will also include solvation if the user chose to do so.
The threshold for this step should be lower than before (up to 7.5 kcal/mol) to account
for the decreasing uncertainty due to improvements in the ranking method. CENSO will
increase the threshold by up to 1 kcal/mol, proportional to the (exponential of the)
standard deviation of the thermal contributions. The solvation contributions will be
calculated using DFT, if required explicitly, though explicitly calculating the solvation
contribution will double the computational effort due to two required single-point calculations.
Optimization
To further improve the ranking, the geometries of the conformers in this step will be
optimized using DFT gradients. For this, the xtb
optimizer will be used as driver.
Solvation effects will be included implicitly. Furthermore, thermal contributions will
be included for the ranking if evaluate_rrho
is set to True
. One can also utilize
a macrocycle optimizer in CENSO (set macrocycle
to True
). This will run a number
(optcycles
) of geometry optimization steps (microcycles) for every macrocycle and
update the ensemble every macrocycle. The single-point Hessian evaluation using xtb
will take place once after at least 6 microcycles and once after finishing the last
macrocycle. The energy threshold for this step is based on a minimum threshold (threshold
)
and TODO
This threshold will be applied once the gradient norm of a conformer is below a
specified threshold (gradthr
) for all the microcycles in the current macrocycle.
It is also possible to use xtb
-constraints for this step. The constraints should be
provided as a file called constraints.xtb
in the working directory. Also, the
constrain
option for the optimization part should be set to True
.
Refinement
After geometry optimization of the ensemble, a high-level DFT calculation should be performed, to obtain highly accurate single-point energies. In this step, the threshold is also more rigorous, using a Boltzmann population cutoff. The sorted (from highest to lowest) populations (in %) of the conformers after calculating the high-level single-point are summed up until reaching the defined threshold, removing all further conformers from consideration.
Ensemble Properties
NMR Spectra
For the calculation of the NMR spectrum of an ensemble, single-points to compute the
nuclear shieldings and couplings will be executed. The computational parameters for shieldings
and couplings can be set to different values. In this case two separate single-points
will be run. If the settings are identical, only one single-point will be run for both.
After that, CENSO will generate files for the simulation of the NMR spectrum using ANMR.
Please note that the user needs to setup the .anmrrc
file.
For more detailed instructions see Calculation of NMR Spectra.
UV/Vis Spectra
To calculate the ensemble UV/Vis spectrum, CENSO will run single-points to calculate the excitation
wavelengths and oscillator strengths using TD-DFT. For this, it is important to choose an appropriate
number of roots sought (nroots
). After finishing, CENSO will output the population weighted
excitation parameters to excitations.out
in tabular format and to excitations.json
for convenience.
The table contains all weighted excitation wavelengths together with their maximum extinction coefficients
and the originating conformer.
To plot the spectra, the tool uvvisplot
provided in the bin
directory (where the runner helper is also located)
can be used. It needs to be provided with a file of the same structure as excitations.json
.
It outputs a file called contributions.csv
which contains all Gaussian signals partitioned by conformer and state.