Regularizers

This page describes KlFunctionInfo and Regularizer classes.

See detailed description of regularizers for understanding their sense.

class artm.KlFunctionInfo(function_type='log', power_value=2.0)
__init__(function_type='log', power_value=2.0)
Parameters:
  • function_type (str) – the type of function, ‘log’ (logarithm) or ‘pol’ (polynomial)
  • power_value (float) – the double power of polynomial, ignored if type = ‘log’
class artm.SmoothSparsePhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, kl_function_info=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, kl_function_info=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • gamma (float) – the coefficient of relative regularization for this regularizer
  • class_ids (list of str) – list of class_ids to regularize, will regularize all classes if not specified
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
  • kl_function_info (KlFunctionInfo object) – class with additional info about function under KL-div in regularizer
  • config (protobuf object) – the low-level config of this regularizer
class artm.SmoothSparseThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, kl_function_info=None, doc_titles=None, doc_topic_coef=None, config=None)
__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, kl_function_info=None, doc_titles=None, doc_topic_coef=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • kl_function_info (KlFunctionInfo object) – class with additional info about function under KL-div in regularizer
  • doc_titles (list of strings) – list of titles of documents to be processed by this regularizer. Default empty value means processing of all documents. User should guarantee the existence and correctness of document titles in batches (e.g. in src files with data, like WV).
  • doc_topic_coef (list of doubles or list of lists of doubles) – Two cases: 1) list of doubles with length equal to num of topics. Means additional multiplier in M-step formula besides alpha and tau, unique for each topic, but general for all processing documents. 2) list of lists of doubles with outer list length equal to length of doc_titles, and each inner list length equal to num of topics. Means case 1 with unique list of additional multipliers for each document from doc_titles. Other documents will not be regularized according to description of doc_titles parameter. Note, that doc_topic_coef and topic_names are both using.
  • config (protobuf object) – the low-level config of this regularizer
class artm.DecorrelatorPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • gamma (float) – the coefficient of relative regularization for this regularizer
  • class_ids (list of str) – list of class_ids to regularize, will regularize all classes if not specified
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • config (protobuf object) – the low-level config of this regularizer
class artm.LabelRegularizationPhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • gamma (float) – the coefficient of relative regularization for this regularizer
  • class_ids (list of str) – list of class_ids to regularize, will regularize all classes if not specified
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified
  • config (protobuf object) – the low-level config of this regularizer
class artm.SpecifiedSparsePhiRegularizer(name=None, tau=1.0, gamma=None, topic_names=None, class_id=None, num_max_elements=None, probability_threshold=None, sparse_by_columns=True, config=None)
__init__(name=None, tau=1.0, gamma=None, topic_names=None, class_id=None, num_max_elements=None, probability_threshold=None, sparse_by_columns=True, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • gamma (float) – the coefficient of relative regularization for this regularizer
  • class_id – class_id to regularize
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • num_max_elements (int) – number of elements to save in row/column
  • probability_threshold (float) – if m elements in row/column sum into value >= probability_threshold, m < n => only these elements would be saved. Value should be in (0, 1), default=None
  • sparse_by_columns (bool) – find max elements in column or in row
  • config (protobuf object) – the low-level config of this regularizer
class artm.ImproveCoherencePhiRegularizer(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
__init__(name=None, tau=1.0, gamma=None, class_ids=None, topic_names=None, dictionary=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • gamma (float) – the coefficient of relative regularization for this regularizer
  • class_ids (list of str) – list of class_ids to regularize, will regularize all classes if not specified, dictionaty should contain pairwise tokens coocurancy info
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • dictionary (str or reference to Dictionary object) – BigARTM collection dictionary, won’t use dictionary if not specified, in this case regularizer is useless
  • config (protobuf object) – the low-level config of this regularizer
class artm.SmoothPtdwRegularizer(name=None, tau=1.0, config=None)
__init__(name=None, tau=1.0, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • config (protobuf object) – the low-level config of this regularizer
class artm.TopicSelectionThetaRegularizer(name=None, tau=1.0, topic_names=None, alpha_iter=None, config=None)
__init__(name=None, tau=1.0, topic_names=None, alpha_iter=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • tau (float) – the coefficient of regularization for this regularizer
  • alpha_iter (list of str) – list of additional coefficients of regularization on each iteration over document. Should have length equal to model.num_document_passes
  • topic_names (list of str) – list of names of topics to regularize, will regularize all topics if not specified
  • config (protobuf object) – the low-level config of this regularizer
class artm.TopicSegmentationPtdwRegularizer(name=None, window=None, threshold=None, background_topic_names=None, config=None)
__init__(name=None, window=None, threshold=None, background_topic_names=None, config=None)
Parameters:
  • name (str) – the identifier of regularizer, will be auto-generated if not specified
  • window (int) – a number of words to the one side over which smoothing will be performed
  • threshold (float) – probability threshold for a word to be a topic-changing word
  • background_topic_names (list of str) – list of names of topics to be considered background, will not consider background topics if not specified
  • config (protobuf object) – the low-level config of this regularizer