Introduction to CENSO

Installation

To install CENSO, simply clone the github repository. Then, you have two options:

  1. install via pip,

  2. setup usage via $PATH.

To install in your current environment using pip, run:

pip install .

CENSO can then be run by configuring a custom runner script (see below) or by calling python3 -m censo in the terminal.

Alternatively, after cloning the repository, you could also add the bin folder to your $PATH. In this directory, there is a helper script called censo, calling the command line entry point of CENSO as if you were calling it using python3 -m censo.

Usage via runner script

In order to call CENSO using a custom runner script, you can use the following example as a template.

from censo.ensembledata import EnsembleData
from censo.configuration import configure
from censo.ensembleopt import Screening, Optimization
from censo.properties import NMR

workdir = "/absolute/path/to/your/workdir" # CENSO will put all files in this directory
input_path = "rel/path/to/your/inputfile" # path relative to the working directory
ensemble = EnsembleData(workdir)
ensemble.read_input(input_path, charge=0, unpaired=0)

# If the user wants to use a specific rcfile:
configure(rcpath="/abs/path/to/rcfile")

# Setup all the parts that the user wants to run (don't need to be in a specific order)
parts = [
    part(ensemble) for part in [Screening, Optimization, NMR]
]

# Run all the parts and collect their runtimes
part_timings = []
for part in parts:
    part_timings.append(part.run())

# If no Exceptions were raised, all the output can now be found in 'workdir'
# Data is given in a formatted plain text format (*.out) and and json format
# The files used in the computations for each conformer can be found in the folders
# generated by each part, respectively (e.g. 'prescreening/CONF2/...')

Requirements

CENSO requires xTB in version 6.4.0 or above. In order to use ORCA, it should be installed in version 4.x or above. It is recommended to use CREST for initial ensemble generation, as well as for better interfacing with ANMR for ensemble NMR spectra calculation.

CENSO requires Python >= 3.10, there are no further dependencies. To use the uvvisplot script numpy and pandas are required.

General information

CENSO logo

Commandline Energetic SOrting (CENSO) is a sorting algorithm for efficient evaluation of Structure Ensembles (SE).

CENSO can be structured into two components:

  1. Ensemble optimization,

  2. Ensemble property calculations.

The first part (ensemble optimization) can use up to four steps:

  1. Prescreening,

  2. Screening,

  3. (Geometry-)Optimization,

  4. Refinement.

In these steps, the ensemble is optimized, using increasingly accurate settings.

Ensemble properties available for calculation are:

  1. NMR spectra,

  2. UV/Vis spectra.

In the property calculation steps the ensemble is not further modified. However, they require at least one ensemble optimization step to be run beforehand for energy rankings and Boltzmann populations.

For now, all calculations can only be performed using the xTB and ORCA programs.

New features in CENSO 2.0.0

Template files

Since 2.0, CENSO supports template input files for all steps. They are located in $HOME/.censo2_assets. In order to use a template file for e.g. prescreening with ORCA, the file should be called prescreening.orca.template. It should contain two keywords: {main} and {geom}. These are later replaced by the main argument line and the geometry block, respectively. All further settings you add are inserted at the respective positions you put them in the template file.

Dummy functionals

Since only a limited amount of functionals are preconfigured in CENSO, the dummy option exists as value for func. This tells CENSO to write no functional specific settings automatically into the input (such as frozencore for double-hybrids in ORCA). By combining this with a template file, it is possible to also use functionals that are not defined as keywords in ORCA, such as e.g. revDSD-PBEP86-D4 (J. M. L. Martin et al., J Phys Chem A 2019 doi: 10.1021/acs.jpca.9b03157).

Running from a script

It is possible to run CENSO from a custom runner script. An example might look like this:

from censo.ensembledata import EnsembleData
from censo.configuration import configure
from censo.ensembleopt import Prescreening, Screening, Optimization
from censo.properties import NMR

workdir = "/absolute/path/to/your/workdir" # CENSO will put all files in this directory
input_path = "rel/path/to/your/inputfile" # path relative to the working directory
ensemble = EnsembleData(workdir)
ensemble.read_input(input_path, charge=0, unpaired=0)

# If the user wants to use a specific rcfile:
configure("/abs/path/to/rcfile")

# Get the number of available cpu cores on this machine
# This number can also be set to any other integer value and automatically checked for validity
ncores = os.cpu_count()

# Setup all the parts that the user wants to run
parts = [
    part(ensemble) for part in [Prescreening, Screening, Optimization, NMR]
]

# The user can also choose to change specific settings of the parts
# Please take note of the following:
# - the settings of certain parts, e.g. Prescreening are changed using set_setting(name, value)
# - general settings are changed by using set_general_setting(name, value) (it does not matter which part you call it from)
# - the values you want to set must comply with limits and the type of the setting
Prescreening.set_setting("threshold", 5.0)
Prescreening.set_general_setting("solvent", "dmso")

# It is also possible to use a dict to set multiple values in one step
settings = {
    "threshold": 3.5,
    "func": "pbeh-3c",
    "implicit": True,
}
Screening.set_settings(settings, complete=False)

# Running a part will return it's runtime in seconds
part_timings = []
for part in parts:
    # Running the parts in order, while it is also possible to use a custom order or run some parts multiple times
    # Note though, that currently this will lead to results being overwritten in your working directory and
    # the ensembledata object
    part_timings.append(part.run(ncores))

# You access the results using the ensemble object
# You can also find all the results the <part>.json output files
print(ensemble.conformers[0].results["prescreening"]["sp"]["energy"])

Ensemble Optimization

Prescreening

The first step after generating an ensemble of the most important conformers, e.g. using CREST, the number of which can range in the hundreds, is to improve on the preliminary ranking using a lightweight DFT method. This should usually already yield significant improvements compared to the preliminary ranking, usually obtained using SQM/FF methods. In the case that solvation effects should be included, CENSO will use xtb to calculate the energy of solvation using the ALPB or GBSA solvation model. The threshold for this step should be rather high (up to 10 kcal/mol).

Screening

After prescreening the ensemble in the first step, this step is supposed to further improve on the ranking quality by increasing the quality of the utilized DFT method. Also, in this step one may choose to include thermal contributions to the free enthalpy by activating evaluate_rrho, which will lead to CENSO using xtb to calculate single-point Hessians. This will also include solvation if the user chose to do so. The threshold for this step should be lower than before (up to 7.5 kcal/mol) to account for the decreasing uncertainty due to improvements in the ranking method. CENSO will increase the threshold by up to 1 kcal/mol, proportional to the (exponential of the) standard deviation of the thermal contributions. The solvation contributions will be calculated using DFT, if required explicitly, though explicitly calculating the solvation contribution will double the computational effort due to two required single-point calculations.

Optimization

To further improve the ranking, the geometries of the conformers in this step will be optimized using DFT gradients. For this, the xtb optimizer will be used as driver. Solvation effects will be included implicitly. Furthermore, thermal contributions will be included for the ranking if evaluate_rrho is set to True. One can also utilize a macrocycle optimizer in CENSO (set macrocycle to True). This will run a number (optcycles) of geometry optimization steps (microcycles) for every macrocycle and update the ensemble every macrocycle. The single-point Hessian evaluation using xtb will take place once after at least 6 microcycles and once after finishing the last macrocycle. The energy threshold for this step is based on a minimum threshold (threshold) and TODO This threshold will be applied once the gradient norm of a conformer is below a specified threshold (gradthr) for all the microcycles in the current macrocycle.

It is also possible to use xtb-constraints for this step. The constraints should be provided as a file called constraints.xtb in the working directory. Also, the constrain option for the optimization part should be set to True.

Refinement

After geometry optimization of the ensemble, a high-level DFT calculation should be performed, to obtain highly accurate single-point energies. In this step, the threshold is also more rigorous, using a Boltzmann population cutoff. The sorted (from highest to lowest) populations (in %) of the conformers after calculating the high-level single-point are summed up until reaching the defined threshold, removing all further conformers from consideration.

Ensemble Properties

NMR Spectra

For the calculation of the NMR spectrum of an ensemble, single-points to compute the nuclear shieldings and couplings will be executed. The computational parameters for shieldings and couplings can be set to different values. In this case two separate single-points will be run. If the settings are identical, only one single-point will be run for both. After that, CENSO will generate files for the simulation of the NMR spectrum using ANMR. Please note that the user needs to setup the .anmrrc file.

For more detailed instructions see Calculation of NMR Spectra.

UV/Vis Spectra

To calculate the ensemble UV/Vis spectrum, CENSO will run single-points to calculate the excitation wavelengths and oscillator strengths using TD-DFT. For this, it is important to choose an appropriate number of roots sought (nroots). After finishing, CENSO will output the population weighted excitation parameters to excitations.out in tabular format and to excitations.json for convenience. The table contains all weighted excitation wavelengths together with their maximum extinction coefficients and the originating conformer.

To plot the spectra, the tool uvvisplot provided in the bin directory (where the runner helper is also located) can be used. It needs to be provided with a file of the same structure as excitations.json. It outputs a file called contributions.csv which contains all Gaussian signals partitioned by conformer and state.