Regularizers

This page describes KlFunctionInfo and Regularizer classes.

See detailed description of regularizers Regularizers Description for understanding their sense.

class artm.KlFunctionInfo(function_type='log', power_value=2.0)
__init__(function_type='log', power_value=2.0)
Parameters:
  • function_type (str) – the type of function, ‘log’ (logarithm) or ‘pol’ (polynomial)
  • power_value (float) – the float power of polynomial, ignored if type = ‘log’
class artm.SmoothSparsePhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, kl_function_info=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, kl_function_info=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer. When gamma is not specified, the value of tau is interpreted as an absolute regularization coefficient; otherwise (when gamma is specified), tau is interpreted as relative regularization coefficient.
  • gamma (float) – coefficient of topics individualization, float, from 0 to 1. When gamma is specified, parameter tau is interpreted as relative regularization coefficient. Absolute regularization coefficient is calculated by multiplying relative coefficient by a topic-dependent scaling factor. The value of gamma indicate coefficient of topics individualization. 0 = all topics share an equal scaling factor. 1 = all topics have an individual scaling factor, irrespective of other topics.
  • class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
  • kl_function_info (KlFunctionInfo object) – class with additional info about function under KL-div in regularizer
  • config (protobuf object) – the low-level config of this regularizer
class artm.SmoothSparseThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, kl_function_info=None, doc_titles=None, doc_topic_coef=None, config=None)
__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, kl_function_info=None, doc_titles=None, doc_topic_coef=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • kl_function_info (KlFunctionInfo object) – class with additional info about function under KL-div in regularizer
  • doc_titles (list of strings) – list of titles of documents to be processed by this regularizer. Default empty value means processing of all documents. User should guarantee the existence and correctness of document titles in batches (e.g. in src files with data, like WV).
  • doc_topic_coef (list of floats or list of lists of floats) – Two cases: 1) list of floats with length equal to num of topics. Means additional multiplier in M-step formula besides alpha and tau, unique for each topic, but general for all processing documents. 2) list of lists of floats with outer list length equal to length of doc_titles, and each inner list length equal to num of topics. Means case 1 with unique list of additional multipliers for each document from doc_titles. Other documents will not be regularized according to description of doc_titles parameter. Note, that doc_topic_coef and topic_names are both using.
  • config (protobuf object) – the low-level config of this regularizer
class artm.DecorrelatorPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, topic_pairs=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, topic_pairs=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • topic_pairs (dict, key - topic name, value - dict with topic names and float values) – information about pairwise topic decorralation coefficients, all topic names from topic_names parameter will be used with 1.0 coefficietn if None.
  • config (protobuf object) – the low-level config of this regularizer
class artm.LabelRegularizationPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
  • config (protobuf object) – the low-level config of this regularizer
class artm.SpecifiedSparsePhiRegularizer(name=None, tau=1.0, gamma=None, topic_names=None, class_id=None, num_max_elements=None, probability_threshold=None, sparse_by_columns=True, config=None)
__init__(name=None, tau=1.0, gamma=None, topic_names=None, class_id=None, num_max_elements=None, probability_threshold=None, sparse_by_columns=True, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_id (str) – class_id to regularize
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • num_max_elements (int) – number of elements to save in row/column
  • probability_threshold (float) – if m elements in row/column sum into value >= probability_threshold, m < n => only these elements would be saved. Value should be in (0, 1), default=None
  • sparse_by_columns (bool) – find max elements in column or in row
  • config (protobuf object) – the low-level config of this regularizer
class artm.ImproveCoherencePhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None dictionary should contain pairwise tokens co-occurrence info
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified, in this case regularizer is useless
  • config (protobuf object) – the low-level config of this regularizer
class artm.SmoothPtdwRegularizer(name=None, tau=1.0, config=None)
__init__(name=None, tau=1.0, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • config (protobuf object) – the low-level config of this regularizer
class artm.TopicSelectionThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, config=None)
__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • config (protobuf object) – the low-level config of this regularizer
class artm.BitermsPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_ids (list of str or str or None) – list of class_ids or single class_id to regularize, will regularize all classes if empty or None
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified, in this case regularizer is useless, dictionary should contain pairwise tokens co-occurrence info
  • config (protobuf object) – the low-level config of this regularizer
class artm.HierarchySparsingThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, parent_topic_proportion=None, config=None)
__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, parent_topic_proportion=None, config=None)
Description:

this regularizer affects psi matrix that contains p(topic|supertopic) values.

Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • config (protobuf object) – the low-level config of this regularizer
  • parent_topic_proportion (list of float) – list of p(supertopic) values that are p(topic) of parent level model
class artm.TopicSegmentationPtdwRegularizer(name=None, window=None, threshold=None, background_topic_names=None, config=None)
__init__(name=None, window=None, threshold=None, background_topic_names=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • window (int) – a number of words to the one side over which smoothing will be performed
  • threshold (float) – probability threshold for a word to be a topic-changing word
  • background_topic_names (list of str or str or None) – list of names or single name of topic to be considered background, will not consider background topics if empty or None
  • config (protobuf object) – the low-level config of this regularizer
class artm.SmoothTimeInTopicsPhiRegularizer(name=None, tau=1.0, gamma=None, class_id=None, topic_names=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_id=None, topic_names=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_id (str) – class_id to regularize
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • config (protobuf object) – the low-level config of this regularizer
class artm.NetPlsaPhiRegularizer(name=None, tau=1.0, gamma=None, class_id=None, symmetric_edge_weights=None, topic_names=None, vertex_names=None, vertex_weights=None, edge_weights=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_id=None, symmetric_edge_weights=None, topic_names=None, vertex_names=None, vertex_weights=None, edge_weights=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer See SmoothSparsePhiRegularizer documentation for further details.
  • gamma (float) – coefficient of topics individualization. See SmoothSparsePhiRegularizer documentation for further details.
  • class_id (str) – name of class_id of special tokens-vertices
  • topic_names (list of str or single str or None) – list of names or single name of topic to regularize, will regularize all topics if empty or None
  • edge_weights (dict, key - first token, value - dict with second tokens and float values) – information about edge weights of NetPLSA model, required.
  • symmetric_edge_weights (bool) – use symmetric edge weights or not
  • vertex_names (list) – list of tokens-vertices of class_id modality, required.
  • vertex_weights (list) – list of weights of vertices, should has equal length with vertex_name, 1.0 values for all vertices will be used by default
  • config (protobuf object) – the low-level config of this regularizer