Regularizers¶

This page describes KlFunctionInfo and Regularizer classes.

See detailed description of regularizers Regularizers Description for understanding their sense.

class artm.KlFunctionInfo(function_type='log', power_value=2.0)¶

__init__(function_type='log', power_value=2.0)¶

Parameters:	function_type (str) – the type of function, ‘log’ (logarithm) or ‘pol’ (polynomial) power_value (float) – the float power of polynomial, ignored if type = ‘log’

class artm.SmoothSparsePhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, kl_function_info=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, kl_function_info=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer. When gamma is not specified, the value of tau is interpreted as an absolute regularization coefficient; otherwise (when gamma is specified), tau is interpreted as relative regularization coefficient.
gamma (float) – coefficient of topics individualization, float, from 0 to 1. When gamma is specified, parameter tau is interpreted as relative regularization coefficient. Absolute regularization coefficient is calculated by multiplying relative coefficient by a topic-dependent scaling factor. The value of gamma indicate coefficient of topics individualization. 0 = all topics share an equal scaling factor. 1 = all topics have an individual scaling factor, irrespective of other topics.
class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
kl_function_info (KlFunctionInfo object) – class with additional info about function under KL-div in regularizer
config (protobuf object) – the low-level config of this regularizer

class artm.SmoothSparseThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, kl_function_info=None, doc_titles=None, doc_topic_coef=None, config=None)¶

__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, kl_function_info=None, doc_titles=None, doc_topic_coef=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer
alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
kl_function_info (KlFunctionInfo object) – class with additional info about function under KL-div in regularizer
doc_titles (list of strings) – list of titles of documents to be processed by this regularizer. Default empty value means processing of all documents. User should guarantee the existence and correctness of document titles in batches (e.g. in src files with data, like WV).
doc_topic_coef (list of floats or list of lists of floats) – Two cases: 1) list of floats with length equal to num of topics. Means additional multiplier in M-step formula besides alpha and tau, unique for each topic, but general for all processing documents. 2) list of lists of floats with outer list length equal to length of doc_titles, and each inner list length equal to num of topics. Means case 1 with unique list of additional multipliers for each document from doc_titles. Other documents will not be regularized according to description of doc_titles parameter. Note, that doc_topic_coef and topic_names are both using.
config (protobuf object) – the low-level config of this regularizer

class artm.DecorrelatorPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, topic_pairs=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, topic_pairs=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
topic_pairs (dict, key - topic name, value - dict with topic names and float values) – information about pairwise topic decorralation coefficients, all topic names from topic_names parameter will be used with 1.0 coefficietn if None.
config (protobuf object) – the low-level config of this regularizer

class artm.LabelRegularizationPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
config (protobuf object) – the low-level config of this regularizer

class artm.SpecifiedSparsePhiRegularizer(name=None, tau=1.0, gamma=None, topic_names=None, class_id=None, num_max_elements=None, probability_threshold=None, sparse_by_columns=True, config=None)¶

__init__(name=None, tau=1.0, gamma=None, topic_names=None, class_id=None, num_max_elements=None, probability_threshold=None, sparse_by_columns=True, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_id (str) – class_id to regularize
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
num_max_elements (int) – number of elements to save in row/column
probability_threshold (float) – if m elements in row/column sum into value >= probability_threshold, m < n => only these elements would be saved. Value should be in (0, 1), default=None
sparse_by_columns (bool) – find max elements in column or in row
config (protobuf object) – the low-level config of this regularizer

class artm.ImproveCoherencePhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None dictionary should contain pairwise tokens co-occurrence info
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified, in this case regularizer is useless
config (protobuf object) – the low-level config of this regularizer

class artm.SmoothPtdwRegularizer(name=None, tau=1.0, config=None)¶

__init__(name=None, tau=1.0, config=None)¶

Parameters:	name (str) – the identifier of regularizer, will be auto-generated if not specified tau (float) – the coefficient of regularization for this regularizer config (protobuf object) – the low-level config of this regularizer

class artm.TopicSelectionThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, config=None)¶

__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer
alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
config (protobuf object) – the low-level config of this regularizer

class artm.BitermsPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified, in this case regularizer is useless, dictionary should contain pairwise tokens co-occurrence info
config (protobuf object) – the low-level config of this regularizer

class artm.HierarchySparsingThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, parent_topic_proportion=None, config=None)¶

__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, parent_topic_proportion=None, config=None)¶

Description:

this regularizer affects psi matrix that contains p(topic|supertopic) values.

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer
alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
config (protobuf object) – the low-level config of this regularizer
parent_topic_proportion (list of float) – list of p(supertopic) values that are p(topic) of parent level model

class artm.TopicSegmentationPtdwRegularizer(name=None, window=None, threshold=None, background_topic_names=None, config=None)¶

__init__(name=None, window=None, threshold=None, background_topic_names=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
window (int) – a number of words to the one side over which smoothing will be performed
threshold (float) – probability threshold for a word to be a topic-changing word
background_topic_names (list of str or str or None) – list of names or single name of topic to be considered background, will not consider background topics if empty or None
config (protobuf object) – the low-level config of this regularizer

class artm.SmoothTimeInTopicsPhiRegularizer(name=None, tau=1.0, gamma=None, class_id=None, topic_names=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_id=None, topic_names=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_id (str) – class_id to regularize
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
config (protobuf object) – the low-level config of this regularizer

class artm.NetPlsaPhiRegularizer(name=None, tau=1.0, gamma=None, class_id=None, symmetric_edge_weights=None, topic_names=None, vertex_names=None, vertex_weights=None, edge_weights=None, config=None)¶

__init__(name=None, tau=1.0, gamma=None, class_id=None, symmetric_edge_weights=None, topic_names=None, vertex_names=None, vertex_weights=None, edge_weights=None, config=None)¶

Parameters:

name (str) – the identifier of regularizer, will be auto-generated if not specified
tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
class_id (str) – name of class_id of special tokens-vertices
topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
edge_weights (dict, key - first token, value - dict with second tokens and float values) – information about edge weights of NetPLSA model, required.
symmetric_edge_weights (bool) – use symmetric edge weights or not
vertex_names (list) – list of tokens-vertices of class_id modality, required.
vertex_weights (list) – list of weights of vertices, should has equal length with vertex_name, 1.0 values for all vertices will be used by default
config (protobuf object) – the low-level config of this regularizer