Messages¶

This document explains all protobuf messages that can be transfered between the user code and BigARTM library.

Warning

Remember that all fields is marked as optional to enhance backwards compatibility of the binary protobuf format. Some fields will result in run-time exception when not specified. Please refer to the documentation of each field for more details.

Note that we discourage any usage of fields marked as obsolete. Those fields will be removed in future releases.

DoubleArray¶

class messages_pb2.DoubleArray¶

Represents an array of double-precision floating point values.

message DoubleArray {
  repeated double value = 1 [packed = true];
}

FloatArray¶

class messages_pb2.FloatArray¶

Represents an array of single-precision floating point values.

message FloatArray {
  repeated float value = 1 [packed = true];
}

BoolArray¶

class messages_pb2.BoolArray¶

Represents an array of boolean values.

message BoolArray {
  repeated bool value = 1 [packed = true];
}

IntArray¶

class messages_pb2.IntArray¶

Represents an array of integer values.

message IntArray {
  repeated int32 value = 1 [packed = true];
}

Item¶

class messages_pb2.Item¶

Represents a unit of textual information. A typical example of an item is a document that belongs to some text collection.

message Item {
  optional int32 id = 1;
  repeated Field field = 2;
  optional string title = 3;
}

Item.id¶: An integer identifier of the item.

Item.field¶: A set of all fields withing the item.

Item.title¶: An optional title of the item.

Field¶

class messages_pb2.Field¶

Represents a field withing an item. The idea behind fields is that each item might have its title, author, body, abstract, actual text, links, year of publication, etc. Each of this entities should be represented as a Field. The topic model defines how those fields should be taken into account when BigARTM infers a topic model. Currently each field is represented as “bag-of-words” — each token is listed together with the number of its occurrences. Note that each Field is always part of an Item, Item is part of a Batch, and a batch always contains a list of tokens. Therefore, each Field just lists the indexes of tokens in the Batch.

message Field {
  optional string name = 1 [default = "@body"];
  repeated int32 token_id = 2;
  repeated int32 token_count = 3;
  repeated int32 token_offset = 4;

  optional string string_value = 5;
  optional int64 int_value = 6;
  optional double double_value = 7;
  optional string date_value = 8;

  repeated string string_array = 16;
  repeated int64 int_array = 17;
  repeated double double_array = 18;
  repeated string date_array = 19;
}

Batch¶

class messages_pb2.Batch¶

Represents a set of items. In BigARTM a batch is never split into smaller parts. When it comes to concurrency this means that each batch goes to a single processor. Two batches can be processed concurrently, but items in one batch are always processed sequentially.

message Batch {
  repeated string token = 1;
  repeated Item item = 2;
  repeated string class_id = 3;
  optional string description = 4;
  optional string id = 5;
}

Batch.token¶: A set value that defines all tokens than may appear in the batch.

Batch.item¶: A set of items of the batch.

Batch.class_id¶: A set of values that define for classes (modalities) of tokens. This repeated field must have the same length as token. This value is optional, use an empty list indicate that all tokens belong to the default class.

Batch.description¶: An optional text description of the batch. You may describe for example the source of the batch, preprocessing technique and the structure of its fields.

Batch.id¶: Unique identifier of the batch in a form of a GUID (example: 4fb38197-3f09-4871-9710-392b14f00d2e). This field is required.

Stream¶

class messages_pb2.Stream¶

Represents a configuration of a stream. Streams provide a mechanism to split the entire collection into virtual subsets (for example, the ‘train’ and ‘test’ streams).

message Stream {
  enum Type {
    Global = 0;
    ItemIdModulus = 1;
  }

  optional Type type = 1 [default = Global];
  optional string name = 2 [default = "@global"];
  optional int32 modulus = 3;
  repeated int32 residuals = 4;
}

Stream.type¶

A value that defines the type of the stream.

`Global`	Defines a stream containing all items in the collection.
`ItemIdModulus`	Defines a stream containing all items with ID that matches modulus and residuals. An item belongs to the stream iff the modulo reminder of item ID is contained in the residuals field.

Stream.name¶: A value that defines the name of the stream. The name must be unique across all streams defined in the master component.

MasterComponentConfig¶

class messages_pb2.MasterComponentConfig¶

Represents a configuration of a master component.

message MasterComponentConfig {
  optional string disk_path = 2;
  repeated Stream stream = 3;
  optional bool compact_batches = 4 [default = true];
  optional bool cache_theta = 5 [default = false];
  optional int32 processors_count = 6 [default = 1];
  optional int32 processor_queue_max_size = 7 [default = 10];
  optional int32 merger_queue_max_size = 8 [default = 10];
  repeated ScoreConfig score_config = 9;
  optional bool online_batch_processing = 13 [default = false];  // obsolete in BigARTM v0.5.8
  optional string disk_cache_path = 15;
}

MasterComponentConfig.disk_path¶: A value that defines the disk location to store or load the collection.

MasterComponentConfig.stream¶: A set of all data streams to configure in master component. Streams can overlap if needed.

MasterComponentConfig.compact_batches¶: A flag indicating whether to compact batches in AddBatch() operation. Compaction is a process that shrinks the dictionary of each batch by removing all unused tokens.

MasterComponentConfig.cache_theta¶: A flag indicating whether to cache theta matrix. Theta matrix defines the discrete probability distribution of each document across the topics in topic model. By default BigARTM infers this distribution every time it processes the document. Option ‘cache_theta’ allows to cache this theta matrix and re-use theha values when the same document is processed on the next iteration. This option must be set to ‘true’ before calling method ArtmRequestThetaMatrix().

MasterComponentConfig.processors_count¶: A value that defines the number of concurrent processor components. The number of processors should normally not exceed the number of CPU cores.

MasterComponentConfig.processor_queue_max_size¶

A value that defines the maximal size of the processor queue. Processor queue contains batches, prefetch from disk into memory. Recommendations regarding the maximal queue size are as follows:

the queue size should be at least as large as the number of concurrent processors;

MasterComponentConfig.merger_queue_max_size¶: A value that defines the maximal size of the merger queue. Merger queue size contains an incremental updates of topic model, produced by processor components. Try reducing this parameter if BigARTM consumes too much memory.

MasterComponentConfig.score_config¶: A set of all scores, available for calculation.

MasterComponentConfig.online_batch_processing¶: Obsolete in BigARTM v0.5.8.

MasterComponentConfig.disk_cache_path¶: A value that defines a writtable disk location where this master component can store some temporary files. This can reduce memory usage, particularly when cache_theta option is enabled. Note that on clean shutdown master component will will be cleaned this folder automatically, but otherwise it is your responsibility to clean this folder to avoid running out of disk.

ModelConfig¶

class messages_pb2.ModelConfig¶

Represents a configuration of a topic model.

message ModelConfig {
  optional string name = 1 [default = "@model"];
  optional int32 topics_count = 2 [default = 32];
  repeated string topic_name = 3;
  optional bool enabled = 4 [default = true];
  optional int32 inner_iterations_count = 5 [default = 10];
  optional string field_name = 6 [default = "@body"];  // obsolete in BigARTM v0.5.8
  optional string stream_name = 7 [default = "@global"];
  repeated string score_name = 8;
  optional bool reuse_theta = 9 [default = false];
  repeated string regularizer_name = 10;
  repeated double regularizer_tau = 11;
  repeated string class_id = 12;
  repeated float class_weight = 13;
  optional bool use_sparse_bow = 14 [default = true];
  optional bool use_random_theta = 15 [default = false];
  optional bool use_new_tokens = 16 [default = true];
  optional bool opt_for_avx = 17 [default = true];
}

ModelConfig.name¶: A value that defines the name of the topic model. The name must be unique across all models defined in the master component.

ModelConfig.topics_count¶: A value that defines the number of topics in the topic model.

ModelConfig.topic_name¶: A repeated field that defines the names of the topics. All topic names must be unique within each topic model. This field is optional, but either topics_count or topic_name must be specified. If both specified, then topics_count will be ignored, and the number of topics in the model will be based on the length of topic_name field. When topic_name is not specified the names for all topics will be autogenerated.

ModelConfig.enabled¶: A flag indicating whether to update the model during iterations.

ModelConfig.inner_iterations_count¶: A value that defines the fixed number of iterations, performed to infer the theta distribution for each document.

ModelConfig.field_name¶: Obsolete in BigARTM v0.5.8

ModelConfig.stream_name¶: A value that defines which stream the model should use.

ModelConfig.score_name¶: A set of names that defines which scores should be calculated for the model.

ModelConfig.reuse_theta¶: A flag indicating whether the model should reuse theta values cached on the previous iterations. This option require cache_theta flag to be set to ‘true’ in MasterComponentConfig.

ModelConfig.regularizer_name¶: A set of names that define which regularizers should be enabled for the model. This repeated field must have the same length as regularizer_tau.

ModelConfig.regularizer_tau¶: A set of values that define the regularization coefficients of the corresponding regularizer. This repeated field must have the same length as regularizer_name.

ModelConfig.class_id¶: A set of values that define for which classes (modalities) to build topic model. This repeated field must have the same length as class_weight.

ModelConfig.class_weight¶: A set of values that define the weights of the corresponding classes (modalities). This repeated field must have the same length as class_id. This value is optional, use an empty list to set equal weights for all classes.

ModelConfig.use_sparse_bow¶

A flag indicating whether to use sparse representation of the Bag-of-words data. The default setting (use_sparse_bow = true) is best suited for processing textual collections where every token is represented in a small fraction of all documents. Dense representation (use_sparse_bow = false) better fits for non-textual collections (for example for matrix factorization).

Note that class_weight and class_id must not be used together with use_sparse_bow=false.

ModelConfig.use_random_theta¶: A flag indicating whether to initialize p(t|d) distribution with random uniform distribution. The default setting (use_random_theta = false) sets p(t|d) = 1/T, where T stands for topics_count. Note that reuse_theta flag takes priority over use_random_theta flag, so that if reuse_theta = true and there is a cache entry from previous iteration the cache entry will be used regardless of use_random_theta flag.

ModelConfig.use_new_tokens¶: A flag indicating whether to automatically include new tokens into the topic model. This setting is set to True by default. As a result, every new token observed in batches is automatically incorporated into topic model during the next model synchronization (ArtmSynchronizeModel()). The n_wt_ weights for new tokens randomly generated from [0..1] range.

ModelConfig.opt_for_avx¶

An experimental flag that allows to disable AVX optimization in processor. By default this option is enabled as on average it adds ca. 40% speedup on physical hardware. You may want to disable this option if you are running on Windows inside virtual machine, or in situation when BigARTM performance degrades from iteration to interation.

This option does not affect the results, and is only intended for advanced users experimenting with BigARTM performance.

RegularizerConfig¶

class messages_pb2.RegularizerConfig¶

Represents a configuration of a general regularizer.

message RegularizerConfig {
  enum Type {
    SmoothSparseTheta = 0;
    SmoothSparsePhi = 1;
    DecorrelatorPhi = 2;
    LabelRegularizationPhi = 4;
  }

  optional string name = 1;
  optional Type type = 2;
  optional bytes config = 3;
}

RegularizerConfig.name¶: A value that defines the name of the regularizer. The name must be unique across all names defined in the master component.

RegularizerConfig.type¶

A value that defines the type of the regularizer.

`SmoothSparseTheta`	Smooth-sparse regularizer for theta matrix
`SmoothSparsePhi`	Smooth-sparse regularizer for phi matrix
`DecorrelatorPhi`	Decorrelator regularizer for phi matrix
`LabelRegularizationPhi`	Label regularizer for phi matrix

RegularizerConfig.config¶: A serialized protobuf message that describes regularizer config for the specific regularizer type.

SmoothSparseThetaConfig¶

class messages_pb2.SmoothSparseThetaConfig¶

Represents a configuration of a SmoothSparse Theta regularizer.

message SmoothSparseThetaConfig {
  repeated string topic_name = 1;
  repeated float alpha_iter = 2;
}

SmoothSparseThetaConfig.topic_name¶: A set of topic names that defines which topics in the model should be regularized. This value is optional, use an empty list to regularize all topics.

SmoothSparseThetaConfig.alpha_iter¶

A field of the same length as ModelConfig.inner_iterations_count that defines relative regularization weight for every iteration inner iterations. The actual regularization value is calculated as product of alpha_iter[i] and ModelConfig.regularizer_tau.

To specify different regularization weight for different topics create multiple regularizers with different topic_name set, and use different values of ModelConfig.regularizer_tau.

SmoothSparsePhiConfig¶

class messages_pb2.SmoothSparsePhiConfig¶

Represents a configuration of a SmoothSparse Phi regularizer.

message SmoothSparsePhiConfig {
  repeated string topic_name = 1;
  repeated string class_id = 2;
  optional string dictionary_name = 3;
}

SmoothSparsePhiConfig.topic_name¶: A set of topic names that defines which topics in the model should be regularized. This value is optional, use an empty list to regularize all topics.

SmoothSparsePhiConfig.class_id¶: This set defines which classes in the model should be regularized. This value is optional, use an empty list to regularize all classes.

SmoothSparsePhiConfig.dictionary_name¶

An optional value defining the name of the dictionary to use. The entries of the dictionary are expected to have DictionaryEntry.key_token, DictionaryEntry.class_id and DictionaryEntry.value fields. The actual regularization value will be calculated as a product of DictionaryEntry.value and ModelConfig.regularizer_tau.

This value is optional, if no dictionary is specified than all tokens will be regularized with the same weight.

DecorrelatorPhiConfig¶

class messages_pb2.DecorrelatorPhiConfig¶

Represents a configuration of a Decorrelator Phi regularizer.

message DecorrelatorPhiConfig {
  repeated string topic_name = 1;
  repeated string class_id = 2;
}

DecorrelatorPhiConfig.topic_name¶: A set of topic names that defines which topics in the model should be regularized. This value is optional, use an empty list to regularize all topics.

DecorrelatorPhiConfig.class_id¶: This set defines which classes in the model should be regularized. This value is optional, use an empty list to regularize all classes.

LabelRegularizationPhiConfig¶

class messages_pb2.LabelRegularizationPhiConfig¶

Represents a configuration of a Label Regularizer Phi regularizer.

message LabelRegularizationPhiConfig {
  repeated string topic_name = 1;
  repeated string class_id = 2;
  optional string dictionary_name = 3;
}

LabelRegularizationPhiConfig.topic_name¶: A set of topic names that defines which topics in the model should be regularized.

LabelRegularizationPhiConfig.class_id¶: This set defines which classes in the model should be regularized. This value is optional, use an empty list to regularize all classes.

LabelRegularizationPhiConfig.dictionary_name¶: An optional value defining the name of the dictionary to use.

RegularizerInternalState¶

class messages_pb2.RegularizerInternalState¶

Represents an internal state of a general regularizer.

message RegularizerInternalState {
  enum Type {
    MultiLanguagePhi = 5;
  }

  optional string name = 1;
  optional Type type = 2;
  optional bytes data = 3;
}

DictionaryConfig¶

class messages_pb2.DictionaryConfig¶

Represents a static dictionary.

message DictionaryConfig {
  optional string name = 1;
  repeated DictionaryEntry entry = 2;
  optional int32 total_token_count = 3;
  optional int32 total_items_count = 4;
}

DictionaryConfig.name¶: A value that defines the name of the dictionary. The name must be unique across all dictionaries defined in the master component.

DictionaryConfig.entry¶: A list of all entries of the dictionary.

DictionaryConfig.total_token_count¶: A sum of DictionaryEntry.token_count across all entries in this dictionary. The value is optional and might be missing when all entries in the dictionary does not carry the DictionaryEntry.token_count attribute.

DictionaryConfig.total_items_count¶: A sum of DictionaryEntry.items_count across all entries in this dictionary. The value is optional and might be missing when all entries in the dictionary does not carry the DictionaryEntry.items_count attribute.

DictionaryEntry¶

class messages_pb2.DictionaryEntry¶

Represents one entry in a static dictionary.

message DictionaryEntry {
  optional string key_token = 1;
  optional string class_id = 2;
  optional float value = 3;
  repeated string value_tokens = 4;
  optional FloatArray values = 5;
  optional int32 token_count = 6;
  optional int32 items_count = 7;
}

DictionaryEntry.key_token¶: A token that defines the key of the entry.

DictionaryEntry.class_id¶: The class of the DictionaryEntry.key_token.

DictionaryEntry.value¶: An optional generic value, associated with the entry. The meaning of this value depends on the usage of the dictionary.

DictionaryEntry.token_count¶: An optional value, indicating the overall number of token occurrences in some collection.

DictionaryEntry.items_count¶: An optional value, indicating the overall number of documents containing the token.

ScoreConfig¶

class messages_pb2.ScoreConfig¶

Represents a configuration of a general score.

message ScoreConfig {
  enum Type {
    Perplexity = 0;
    SparsityTheta = 1;
    SparsityPhi = 2;
    ItemsProcessed = 3;
    TopTokens = 4;
    ThetaSnippet = 5;
    TopicKernel = 6;
  }

  optional string name = 1;
  optional Type type = 2;
  optional bytes config = 3;
}

ScoreConfig.name¶: A value that defines the name of the score. The name must be unique across all names defined in the master component.

ScoreConfig.type¶

A value that defines the type of the score.

`Perplexity`	Defines a config of the Perplexity score
`SparsityTheta`	Defines a config of the SparsityTheta score
`SparsityPhi`	Defines a config of the SparsityPhi score
`ItemsProcessed`	Defines a config of the ItemsProcessed score
`TopTokens`	Defines a config of the TopTokens score
`ThetaSnippet`	Defines a config of the ThetaSnippet score
`TopicKernel`	Defines a config of the TopicKernel score

ScoreConfig.config¶: A serialized protobuf message that describes score config for the specific score type.

ScoreData¶

class messages_pb2.ScoreData¶

Represents a general result of score calculation.

message ScoreData {
  enum Type {
    Perplexity = 0;
    SparsityTheta = 1;
    SparsityPhi = 2;
    ItemsProcessed = 3;
    TopTokens = 4;
    ThetaSnippet = 5;
    TopicKernel = 6;
  }

  optional string name = 1;
  optional Type type = 2;
  optional bytes data = 3;
}

ScoreData.name¶: A value that describes the name of the score. This name will match the name of the corresponding score config.

ScoreData.type¶

A value that defines the type of the score.

`Perplexity`	Defines a Perplexity score data
`SparsityTheta`	Defines a SparsityTheta score data
`SparsityPhi`	Defines a SparsityPhi score data
`ItemsProcessed`	Defines a ItemsProcessed score data
`TopTokens`	Defines a TopTokens score data
`ThetaSnippet`	Defines a ThetaSnippet score data
`TopicKernel`	Defines a TopicKernel score data

ScoreData.data¶: A serialized protobuf message that provides the specific score result.

PerplexityScoreConfig¶

class messages_pb2.PerplexityScoreConfig¶

Represents a configuration of a perplexity score.

message PerplexityScoreConfig {
  enum Type {
    UnigramDocumentModel = 0;
    UnigramCollectionModel = 1;
  }

  optional string field_name = 1 [default = "@body"];  // obsolete in BigARTM v0.5.8
  optional string stream_name = 2 [default = "@global"];
  optional Type model_type = 3 [default = UnigramDocumentModel];
  optional string dictionary_name = 4;
  optional float theta_sparsity_eps = 5 [default = 1e-37];
  repeated string theta_sparsity_topic_name = 6;
}

PerplexityScoreConfig.field_name¶: Obsolete in BigARTM v0.5.8

PerplexityScoreConfig.stream_name¶: A value that defines which stream should be used in perplexity calculation.

PerplexityScore¶

class messages_pb2.PerplexityScore¶

Represents a result of calculation of a perplexity score.

message PerplexityScore {
  optional double value = 1;
  optional double raw = 2;
  optional double normalizer = 3;
  optional int32 zero_words = 4;
  optional double theta_sparsity_value = 5;
  optional int32 theta_sparsity_zero_topics = 6;
  optional int32 theta_sparsity_total_topics = 7;
}

PerplexityScore.value¶: A perplexity value which is calculated as exp(-raw/normalizer).

PerplexityScore.raw¶: A numerator of perplexity calculation. This value is equal to the likelihood of the topic model.

PerplexityScore.normalizer¶: A denominator of perplexity calculation. This value is equal to the total number of tokens in all processed items.

PerplexityScore.zero_words¶: A number of tokens that have zero probability p(w|t,d) in a document. Such tokens are evaluated based on to unigram document model or unigram colection model.

PerplexityScore.theta_sparsity_value¶: A fraction of zero entries in the theta matrix.

SparsityThetaScoreConfig¶

class messages_pb2.SparsityThetaScoreConfig¶

Represents a configuration of a theta sparsity score.

message SparsityThetaScoreConfig {
  optional string field_name = 1 [default = "@body"];  // obsolete in BigARTM v0.5.8
  optional string stream_name = 2 [default = "@global"];
  optional float eps = 3 [default = 1e-37];
  repeated string topic_name = 4;
}

SparsityThetaScoreConfig.field_name¶: Obsolete in BigARTM v0.5.8

SparsityThetaScoreConfig.stream_name¶: A value that defines which stream should be used in theta sparsity calculation.

SparsityThetaScoreConfig.eps¶: A small value that defines zero threshold for theta probabilities. Theta values below the threshold will be counted as zeros when calculating theta sparsity score.

SparsityThetaScoreConfig.topic_name¶: A set of topic names that defines which topics should be used for score calculation. The names correspond to ModelConfig.topic_name. This value is optional, use an empty list to calculate the score for all topics.

SparsityThetaScore¶

class messages_pb2.SparsityThetaScoreConfig

Represents a result of calculation of a theta sparsity score.

message SparsityThetaScore {
  optional double value = 1;
  optional int32 zero_topics = 2;
  optional int32 total_topics = 3;
}

SparsityThetaScore.value¶: A value of theta sparsity that is calculated as zero_topics / total_topics.

SparsityThetaScore.zero_topics¶: A numerator of theta sparsity score. A number of topics that have zero probability in a topic-item distribution.

SparsityThetaScore.total_topics¶: A denominator of theta sparsity score. A total number of topics in a topic-item distributions that are used in theta sparsity calculation.

SparsityPhiScoreConfig¶

class messages_pb2.SparsityPhiScoreConfig¶

Represents a configuration of a sparsity phi score.

message SparsityPhiScoreConfig {
  optional float eps = 1 [default = 1e-37];
  optional string class_id = 2;
  repeated string topic_name = 3;
}

SparsityPhiScoreConfig.eps¶: A small value that defines zero threshold for phi probabilities. Phi values below the threshold will be counted as zeros when calculating phi sparsity score.

SparsityPhiScoreConfig.class_id¶: A value that defines the class of tokens to use for score calculation. This value corresponds to ModelConfig.class_id field. This value is optional. By default the score will be calculated for the default class (‘@default_class’).

SparsityPhiScoreConfig.topic_name¶: A set of topic names that defines which topics should be used for score calculation. This value is optional, use an empty list to calculate the score for all topics.

SparsityPhiScore¶

class messages_pb2.SparsityPhiScore¶

Represents a result of calculation of a phi sparsity score.

message SparsityPhiScore {
  optional double value = 1;
  optional int32 zero_tokens = 2;
  optional int32 total_tokens = 3;
}

SparsityPhiScore.value¶: A value of phi sparsity that is calculated as zero_tokens / total_tokens.

SparsityPhiScore.zero_tokens¶: A numerator of phi sparsity score. A number of tokens that have zero probability in a token-topic distribution.

SparsityPhiScore.total_tokens¶: A denominator of phi sparsity score. A total number of tokens in a token-topic distributions that are used in phi sparsity calculation.

ItemsProcessedScoreConfig¶

class messages_pb2.ItemsProcessedScoreConfig¶

Represents a configuration of an items processed score.

message ItemsProcessedScoreConfig {
  optional string field_name = 1 [default = "@body"];  // obsolete in BigARTM v0.5.8
  optional string stream_name = 2 [default = "@global"];
}

ItemsProcessedScoreConfig.field_name¶: Obsolete in BigARTM v0.5.8

ItemsProcessedScoreConfig.stream_name¶: A value that defines which stream should be used in calculation of processed items.

ItemsProcessedScore¶

class messages_pb2.ItemsProcessedScore¶

Represents a result of calculation of an items processed score.

message ItemsProcessedScore {
  optional int32 value = 1;
}

ItemsProcessedScore.value¶: A number of items that belong to the stream ItemsProcessedScoreConfig.stream_name and have been processed during iterations. Currently this number is aggregated throughout all iterations.

TopTokensScoreConfig¶

class messages_pb2.TopTokensScoreConfig¶

Represents a configuration of a top tokens score.

message TopTokensScoreConfig {
  optional int32 num_tokens = 1 [default = 10];
  optional string class_id = 2;
  repeated string topic_name = 3;
}

TopTokensScoreConfig.num_tokens¶: A value that defines how many top tokens should be retrieved for each topic.

TopTokensScoreConfig.class_id¶

A value that defines for which class of the model to collect top tokens. This value corresponds to ModelConfig.class_id field.

This parameter is optional. By default tokens will be retrieved for the default class (‘@default_class’).

TopTokensScoreConfig.topic_name¶

A set of values that represent the names of the topics to include in the result. The names correspond to ModelConfig.topic_name.

This parameter is optional. By default top tokens will be calculated for all topics in the model.

TopTokensScore¶

class messages_pb2.TopTokensScore¶

Represents a result of calculation of a top tokens score.

message TopTokensScore {
  optional int32 num_entries = 1;
  repeated string topic_name = 2;
  repeated int32 topic_index = 3;
  repeated string token = 4;
  repeated float weight = 5;
}

The data in this score is represented in a table-like format. sorted on topic_index. The following code block gives a typical usage example. The loop below is guarantied to process all top-N tokens for the first topic, then for the second topic, etc.

for (int i = 0; i < top_tokens_score.num_entries(); i++) {
  // Gives a index from 0 to (model_config.topics_size() - 1)
  int topic_index = top_tokens_score.topic_index(i);

  // Gives one of the topN tokens for topic 'topic_index'
  std::string token = top_tokens_score.token(i);

  // Gives the weight of the token
  float weight = top_tokens_score.weight(i);
}

TopTokensScore.num_entries¶: A value indicating the overall number of entries in the score. All the remaining repeated fiels in this score will have this length.

TopTokensScore.token¶: A repeated field of num_entries elements, containing tokens with high probability.

TopTokensScore.weight¶: A repeated field of num_entries elements, containing the p(t|w) probabilities.

TopTokensScore.topic_index¶: A repeated field of num_entries elements, containing integers between 0 and (ModelConfig.topics_count - 1).

TopTokensScore.topic_name¶: A repeated field of num_entries elements, corresponding to the values of ModelConfig.topic_name field.

ThetaSnippetScoreConfig¶

class messages_pb2.ThetaSnippetScoreConfig¶

Represents a configuration of a theta snippet score.

message ThetaSnippetScoreConfig {
  optional string field_name = 1 [default = "@body"];  // obsolete in BigARTM v0.5.8
  optional string stream_name = 2 [default = "@global"];
  repeated int32 item_id = 3 [packed = true];  // obsolete in BigARTM v0.5.8
  optional int32 item_count = 4 [default = 10];
}

ThetaSnippetScoreConfig.field_name¶: Obsolete in BigARTM v0.5.8

ThetaSnippetScoreConfig.stream_name¶: A value that defines which stream should be used in calculation of a theta snippet.

ThetaSnippetScoreConfig.item_id¶: Obsolete in BigARTM v0.5.8.

ThetaSnippetScoreConfig.item_count¶: The number of items to retrieve. ThetaSnippetScore will select last item_count processed items and return their theta vectors.

ThetaSnippetScore¶

class messages_pb2.ThetaSnippetScore¶

Represents a result of calculation of a theta snippet score.

message ThetaSnippetScore {
  repeated int32 item_id = 1;
  repeated FloatArray values = 2;
}

ThetaSnippetScore.item_id¶: A set of item ids for which theta snippet have been calculated. Items are identified by the item id.

ThetaSnippetScore.values¶: A set of values that define topic probabilities for each item. The length of these repeated values will match the number of item ids specified in ThetaSnippetScore.item_id. Each repeated field contains float array of topic probabilities in the natural order of topic ids.

TopicKernelScoreConfig¶

class messages_pb2.TopicKernelScoreConfig¶

Represents a configuration of a topic kernel score.

message TopicKernelScoreConfig {
  optional float eps = 1 [default = 1e-37];
  optional string class_id = 2;
  repeated string topic_name = 3;
  optional double probability_mass_threshold = 4 [default = 0.1];
}

Kernel of a topic model is defined as the list of all tokens such that the probability p(t | w) exceeds probability mass threshold.
Kernel size of a topic t is defined as the number of tokens in its kernel.
Topic purity of a topic t is defined as the sum of p(w | t) across all tokens w in the kernel.
Topic contrast of a topic t is defined as the sum of p(t | w) across all tokens w in the kernel defided by the size of the kernel.

TopicKernelScoreConfig.eps¶: Defines the minimum threshold on kernel size. In most cases this parameter should be kept at the default value.

TopicKernelScoreConfig.class_id¶: A value that defines the class of tokens to use for score calculation. This value corresponds to ModelConfig.class_id field. This value is optional. By default the score will be calculated for the default class (‘@default_class’).

TopicKernelScoreConfig.topic_name¶: A set of topic names that defines which topics should be used for score calculation. This value is optional, use an empty list to calculate the score for all topics.

TopicKernelScoreConfig.probability_mass_threshold¶: Defines the probability mass threshold (see the definition of kernel above).

TopicKernelScore¶

class messages_pb2.TopicKernelScore¶

Represents a result of calculation of a topic kernel score.

message TopicKernelScore {
  optional DoubleArray kernel_size = 1;
  optional DoubleArray kernel_purity = 2;
  optional DoubleArray kernel_contrast = 3;
  optional double average_kernel_size = 4;
  optional double average_kernel_purity = 5;
  optional double average_kernel_contrast = 6;
}

TopicKernelScore.kernel_size¶: Provides the kernel size for all requested topics. The length of this DoubleArray is always equal to the overall number of topics. The values of -1 correspond to non-calculated topics. The remaining values carry the kernel size of the requested topics.

TopicKernelScore.kernel_purity¶: Provides the kernel purity for all requested topics. The length of this DoubleArray is always equal to the overall number of topics. The values of -1 correspond to non-calculated topics. The remaining values carry the kernel size of the requested topics.

TopicKernelScore.kernel_contrast¶: Provides the kernel contrast for all requested topics. The length of this DoubleArray is always equal to the overall number of topics. The values of -1 correspond to non-calculated topics. The remaining values carry the kernel contrast of the requested topics.

TopicKernelScore.average_kernel_size¶: Provides the average kernel size across all the requested topics.

TopicKernelScore.average_kernel_purity¶: Provides the average kernel purity across all the requested topics.

TopicKernelScore.average_kernel_contrast¶: Provides the average kernel contrast across all the requested topics.

TopicModel¶

class messages_pb2.TopicModel¶

Represents a topic model. This message can contain data in either dense or sparse format. The key idea behind sparse format is to avoid storing zero p(w|t) elements of the Phi matrix. Please refer to the description of TopicModel.topic_index field for more details.

To distinguish between these two formats check whether repeated field TopicModel.topic_index is empty. An empty field indicate a dense format, otherwise the message contains data in a sparse format. To request topic model in a sparse format set GetTopicModelArgs.use_sparse_format field to True when calling ArtmRequestTopicModel().

message TopicModel {
  enum OperationType {
    Initialize = 0;
    Increment = 1;
    Overwrite = 2;
    Remove = 3;
    Ignore = 4;
  }

  optional string name = 1 [default = "@model"];
  optional int32 topics_count = 2;
  repeated string topic_name = 3;
  repeated string token = 4;
  repeated FloatArray token_weights = 5;
  repeated string class_id = 6;

  message TopicModelInternals {
    repeated FloatArray n_wt = 1;
    repeated FloatArray r_wt = 2;
  }

  optional bytes internals = 7;  // obsolete in BigARTM v0.6.3
  repeated IntArray topic_index = 8;
  repeated OperationType operation_type = 9;
}

TopicModel.name¶: A value that describes the name of the topic model (TopicModel.name).

TopicModel.topics_count¶: A value that describes the number of topics in this message.

TopicModel.topic_name¶: A value that describes the names of the topics included in given TopicModel message. This values will represent a subset of topics, defined by GetTopicModelArgs.topic_name message. In case of empty GetTopicModelArgs.topic_name this values will correspond to the entire set of topics, defined in ModelConfig.topic_name field.

TopicModel.token¶: The set of all tokens, included in the topic model.

TopicModel.token_weights¶: A set of token weights. The length of this repeated field will match the length of the repeated field TopicModel.token. The length of each FloatArray will match the TopicModel.topics_count field (in dense representation), or the length of the corresponding IntArray from TopicModel.topic_index field (in sparse representation).

TopicModel.class_id¶: A set values that specify the class (modality) of the tokens. The length of this repeated field will match the length of the repeated field TopicModel.token.

TopicModel.internals¶: Obsolete in BigARTM v0.6.3.

TopicModel.topic_index¶: A repeated field used for sparse topic model representation. This field has the same length as TopicModel.token, TopicModel.class_id and TopicModel.token_weights. Each element in topic_index is an instance of IntArray message, containing a list of values between 0 and the length of TopicModel.topic_name field. This values correspond to the indices in TopicModel.topic_name array, and tell which topics has non-zero p(w|t) probabilities for a given token. The actual p(w|t) values can be found in TopicModel.token_weights field. The length of each IntArray message in TopicModel.topic_index field equals to the length of the corresponding FloatArray message in TopicModel.token_weights field.

Warning

Be careful with TopicModel.topic_index when this message represents a subset of topics, defined by GetTopicModelArgs.topic_name. In this case indices correspond to the selected subset of topics, which might not correspond to topic indices in the original ModelConfig message.

TopicModel.operation_type¶

A set of values that define operation to perform on each token when topic model is used as an argument of ArtmOverwriteTopicModel().

`Initialize`	Indicates that a new token should be added to the topic model. Initial `n_wt` counter will be initialized with random value from `[0, 1]` range. `TopicModel.token_weights` is ignored. This operation is ignored if token already exists.
`Increment`	Indicates that `n_wt` counter of the token should be increased by values, specified in `TopicModel.token_weights` field. A new token will be created if it does not exist yet.
`Overwrite`	Indicates that `n_wt` counter of the token should be set to the value, specified in `TopicModel.token_weights` field. A new token will be created if it does not exist yet.
`Remove`	Indicates that the token should be removed from the topic model. `TopicModel.token_weights` is ignored.
`Ignore`	Indicates no operation for the token. The effect is the same as if the token is not present in this message.

ThetaMatrix¶

class messages_pb2.ThetaMatrix¶

Represents a theta matrix. This message can contain data in either dense or sparse format. The key idea behind sparse format is to avoid storing zero p(t|d) elements of the Theta matrix. Sparse representation of Theta matrix is equivalent to sparse representation of Phi matrix. Please, refer to TopicModel for detailed description of the sparse format.

message ThetaMatrix {
  optional string model_name = 1 [default = "@model"];
  repeated int32 item_id = 2;
  repeated FloatArray item_weights = 3;
  repeated string topic_name = 4;
  optional int32 topics_count = 5;
  repeated string item_title = 6;
  repeated IntArray topic_index = 7;
}

ThetaMatrix.model_name¶: A value that describes the name of the topic model. This name will match the name of the corresponding model config.

ThetaMatrix.item_id¶: A set of item IDs corresponding to Item.id values.

ThetaMatrix.item_weights¶: A set of item ID weights. The length of this repeated field will match the length of the repeated field ThetaMatrix.item_id. The length of each FloatArray will match the ThetaMatrix.topics_count field (in dense representation), or the length of the corresponding IntArray from ThetaMatrix.topic_index field (in sparse representation).

ThetaMatrix.topic_name¶: A value that describes the names of the topics included in given ThetaMatrix message. This values will represent a subset of topics, defined by GetThetaMatrixArgs.topic_name message. In case of empty GetTopicModelArgs.topic_name this values will correspond to the entire set of topics, defined in ModelConfig.topic_name field.

ThetaMatrix.topics_count¶: A value that describes the number of topics in this message.

ThetaMatrix.item_title¶: A set of item titles, corresponding to Item.title values. Beware that this field might be empty (e.g. of zero length) if all items did not have title specified in Item.title.

ThetaMatrix.topic_index¶: A repeated field used for sparse theta matrix representation. This field has the same length as ThetaMatrix.item_id, ThetaMatrix.item_weights and ThetaMatrix.item_title. Each element in topic_index is an instance of IntArray message, containing a list of values between 0 and the length of TopicModel.topic_name field. This values correspond to the indices in ThetaMatrix.topic_name array, and tell which topics has non-zero p(t|d) probabilities for a given item. The actual p(t|d) values can be found in ThetaMatrix.item_weights field. The length of each IntArray message in ThetaMatrix.topic_index field equals to the length of the corresponding FloatArray message in ThetaMatrix.item_weights field.

Warning

Be careful with ThetaMatrix.topic_index when this message represents a subset of topics, defined by GetThetaMatrixArgs.topic_name. In this case indices correspond to the selected subset of topics, which might not correspond to topic indices in the original ModelConfig message.

CollectionParserConfig¶

class messages_pb2.CollectionParserConfig¶

Represents a configuration of a collection parser.

message CollectionParserConfig {
  enum Format {
    BagOfWordsUci = 0;
    MatrixMarket = 1;
  }

  optional Format format = 1 [default = BagOfWordsUci];
  optional string docword_file_path = 2;
  optional string vocab_file_path = 3;
  optional string target_folder = 4;
  optional string dictionary_file_name = 5;
  optional int32 num_items_per_batch = 6 [default = 1000];
  optional string cooccurrence_file_name = 7;
  repeated string cooccurrence_token = 8;
  optional bool use_unity_based_indices = 9 [default = true];
}

CollectionParserConfig.format¶

A value that defines the format of a collection to be parsed.

BagOfWordsUci

A bag-of-words collection, stored in UCI format.
UCI format must have two files - vocab.*.txt
and docword.*.txt, defined by
docword_file_path
and vocab_file_path.
The format of the docword.*.txt file is 3 header
lines, followed by NNZ triples:

D
W
NNZ
docID wordID count
docID wordID count
...
docID wordID count

The file must be sorted on docID.
Values of wordID must be unity-based (not zero-based).
The format of the vocab.*.txt file is line containing wordID=n.
Note that words must not have spaces or tabs.
In vocab.*.txt file it is also possible to specify
Batch.class_id for tokens, as it is shown in this example:

token1 @default_class
token2 custom_class
token3 @default_class
token4

Use space or tab to separate token from its class.
Token that are not followed by class label automatically
get ‘’@default_class’’ as a lable (see ‘’token4’’ in the example).

MatrixMarket

See the description at http://math.nist.gov/MatrixMarket/formats.html
In this mode parameter docword_file_path must refer to a file
in Matrix Market format. Parameter vocab_file_path
is also required and must refer to a dictionary file exported in
gensim format (dictionary.save_as_text()).

CollectionParserConfig.docword_file_path¶: A value that defines the disk location of a docword.*.txt file (the bag of words file in sparse format).

CollectionParserConfig.vocab_file_path¶: A value that defines the disk location of a vocab.*.txt file (the file with the vocabulary of the collection).

CollectionParserConfig.target_folder¶: A value that defines the disk location where to stores all the results after parsing the colleciton. Usually the resulting location will contain a set of batches, and a DictionaryConfig that contains all unique tokens occured in the collection. Such location can be further passed MasterComponent via MasterComponentConfig.disk_path.

CollectionParserConfig.dictionary_file_name¶

A file name where to save the DictionaryConfig message that contains all unique tokens occured in the collection. The file will be created in target_folder.

This parameter is optional. The dictionary will be still collected even when this parameter is not provided, but the resulting dictionary will be only returned as the result of ArtmRequestParseCollection, but it will not be stored to disk.

In the resulting dictionary each entry will have the following fields:

DictionaryEntry.key_token - the textual representation of the token,
DictionaryEntry.class_id - the label of the default class (“@DefaultClass”),
DictionaryEntry.token_count - the overall number of occurrences of the token in the collection,
DictionaryEntry.items_count - the number of documents in the collection, containing the token.
DictionaryEntry.value - the ratio between token_count and total_token_count.

Use ArtmRequestLoadDictionary method to load the resulting dictionary.

CollectionParserConfig.num_items_per_batch¶: A value indicating the desired number of items per batch.

CollectionParserConfig.cooccurrence_file_name¶

A file name where to save the DictionaryConfig message that contains information about co-occurrence of all pairs of tokens in the collection. The file will be created in target_folder.

This parameter is optional. No cooccurrence information will be collected if the filename is not provided.

In the resulting dictionary each entry will correspond to two tokens (‘<first>’ and ‘<second>’), and carry the information about co-occurrence of this tokens in the collection.

DictionaryEntry.key_token - a string of the form ‘<first>~<second>’, produced by concatenation of two tokens together via the tilde symbol (‘~’). <first> tokens is guarantied lexicographic less than the <second> token.
DictionaryEntry.class_id - the label of the default class (“@DefaultClass”).
DictionaryEntry.items_count - the number of documents in the collection, containing both tokens (‘<first>’ and ‘<second>’)

Use ArtmRequestLoadDictionary method to load the resulting dictionary.

CollectionParserConfig.cooccurrence_token¶: A list of tokens to collect cooccurrence information. A cooccurrence of the pair <first>~<second> will be collected only when both tokens are present in CollectionParserConfig.cooccurrence_token.

CollectionParserConfig.use_unity_based_indices¶: A flag indicating whether to interpret indices in docword file as unity-based or as zero-based. By default ‘use_unity_based_indices = True`, as required by UCI bag-of-words format.

SynchronizeModelArgs¶

class messages_pb2.SynchronizeModelArgs¶

Represents an argument of synchronize model operation.

message SynchronizeModelArgs {
  optional string model_name = 1;
  optional float decay_weight = 2 [default = 0.0];
  optional bool invoke_regularizers = 3 [default = true];
  optional float apply_weight = 4 [default = 1.0];
}

SynchronizeModelArgs.model_name¶: The name of the model to be synchronized. This value is optional. When not set, all models will be synchronized with the same decay weight.

SynchronizeModelArgs.decay_weight¶

The decay weight and apply_weight define how to combine existing topic model with all increments, calculated since the last ArtmSynchronizeModel(). This is best described by the following formula:

n_wt_new = n_wt_old * decay_weight + n_wt_inc * apply_weight,

where n_wt_old describe current topic model, n_wt_inc describe increment calculated since last ArtmSynchronizeModel(), n_wt_new define the resulting topic model.

Expected values of both parameters are between 0.0 and 1.0. Here are some examples:

Combination of decay_weight=0.0 and apply_weight=1.0 states that the previous Phi matrix of the topic model will be disregarded completely, and the new Phi matrix will be formed based on new increments gathered since last model synchronize.
Combination of decay_weight=1.0 and apply_weight=1.0 states that new increments will be appended to the current Phi matrix without any decay.
Combination of decay_weight=1.0 and apply_weight=0.0 states that new increments will be disregarded, and current Phi matrix will stay unchanged.
To reproduce Online variational Bayes for LDA algorighm by Matthew D. Hoffman set decay_weight = 1 - rho and apply_weight = rho, where parameter rho is defined as rho = exp(tau + t, -kappa). See Online Learning for Latent Dirichlet Allocation for further details.

SynchronizeModelArgs.apply_weight¶: See decay_weight for the description.

SynchronizeModelArgs.invoke_regularizers¶: A flag indicating whether to invoke all phi-regularizers.

InitializeModelArgs¶

class messages_pb2.InitializeModelArgs¶

Represents an argument of ArtmInitializeModel() operation. Please refer to example14_initialize_topic_model.py for further information.

message InitializeModelArgs {
  enum SourceType {
    Dictionary = 0;
    Batches = 1;
  }

  message Filter {
    optional string class_id = 1;
    optional float min_percentage = 2;
    optional float max_percentage = 3;
    optional int32 min_items = 4;
    optional int32 max_items = 5;
    optional int32 min_total_count = 6;
    optional int32 min_one_item_count = 7;
  }

  optional string model_name = 1;
  optional string dictionary_name = 2;
  optional SourceType source_type = 3 [default = Dictionary];

  optional string disk_path = 4;
  repeated Filter filter = 5;
}

InitializeModelArgs.model_name¶: The name of the model to be initialized.

InitializeModelArgs.dictionary_name¶: The name of the dictionary containing all tokens that should be initialized.

GetTopicModelArgs¶

Represents an argument of ArtmRequestTopicModel() operation.

message GetTopicModelArgs {
  enum RequestType {
    Pwt = 0;
    Nwt = 1;
  }

  optional string model_name = 1;
  repeated string topic_name = 2;
  repeated string token = 3;
  repeated string class_id = 4;
  optional bool use_sparse_format = 5;
  optional float eps = 6 [default = 1e-37];
  optional RequestType request_type = 7 [default = Pwt];
}

GetTopicModelArgs.model_name¶: The name of the model to be retrieved.

GetTopicModelArgs.topic_name¶: The list of topic names to be retrieved. This value is optional. When not provided, all topics will be retrieved.

GetTopicModelArgs.token¶: The list of tokens to be retrieved. The length of this field must match the length of class_id field. This field is optional. When not provided, all tokens will be retrieved.

GetTopicModelArgs.class_id¶: The list of classes corresponding to all tokens. The length of this field must match the length of token field. This field is only required together with token, otherwise it is ignored.

GetTopicModelArgs.use_sparse_format¶: An optional flag that defines whether to use sparse format for the resulting TopicModel message. See TopicModel message for additional information about the sparse format. Note that setting use_sparse_format = true results in empty TopicModel.internals field.

GetTopicModelArgs.eps¶: A small value that defines zero threshold for p(w|t) probabilities. This field is only used in sparse format. p(w|t) below the threshold will be excluded from the resulting Phi matrix.

GetTopicModelArgs.request_type¶

An optional value that defines what kind of data to retrieve in this operation.

Pwt	Indicates that the resulting TopicModel message should contain `p(w\|t)` probabilities. This values are normalized to form a probability distribution (`sum_w p(w\|t) = 1` for all topics `t`).
Nwt	Indicates that the resulting TopicModel message should contain internal `n_wt` counters of the topic model. This values represent an internal state of the topic model.

Default setting is to retrieve p(w|t) probabilities. This probabilities are sufficient to infer p(t|d) distributions using this topic model.

n_wt counters allow you to restore the precise state of the topic model. By passing this values in ArtmOverwriteTopicModel() operation you are guarantied to get the model in the same state as you retrieved it. As the result you may continue topic model inference from the point you have stopped it last time.

p(w|t) values can be also restored via c:func:ArtmOverwriteTopicModel operation. The resulting model will give the same p(t|d) distributions, however you should consider this model as read-only, and do not call ArtmSynchronizeModel() on it.

GetThetaMatrixArgs¶

Represents an argument of ArtmRequestThetaMatrix() operation.

message GetThetaMatrixArgs {
  optional string model_name = 1;
  optional Batch batch = 2;
  repeated string topic_name = 3;
  repeated int32 topic_index = 4;
  optional bool clean_cache = 5 [default = false];
  optional bool use_sparse_format = 6 [default = false];
  optional float eps = 7 [default = 1e-37];
}

GetThetaMatrixArgs.model_name¶: The name of the model to retrieved theta matrix for.

GetThetaMatrixArgs.batch¶: The Batch to classify with the model.

GetThetaMatrixArgs.topic_name¶: The list of topic names, describing which topics to include in the Theta matrix. The values of this field should correspond to values in ModelConfig.topic_name. This field is optional, by default all topics will be included.

GetThetaMatrixArgs.topic_index¶

The list of topic indices, describing which topics to include in the Theta matrix. The values of this field should be an integers between 0 and (ModelConfig.topics_count - 1). This field is optional, by default all topics will be included.

Note that this field acts similar to GetThetaMatrixArgs.topic_name. It is not allowed to specify both topic_index and topic_name at the same time. The recommendation is to use topic_name.

GetThetaMatrixArgs.clean_cache¶: An optional flag that defines whether to clear the theta matrix cache after this operation. Setting this value to True will clear the cache for a topic model, defined by GetThetaMatrixArgs.model_name. This value is only applicable when MasterComponentConfig.cache_theta is set to True.

GetThetaMatrixArgs.use_sparse_format¶: An optional flag that defines whether to use sparse format for the resulting ThetaMatrix message. See ThetaMatrix message for additional information about the sparse format.

GetThetaMatrixArgs.eps¶: A small value that defines zero threshold for p(t|d) probabilities. This field is only used in sparse format. p(t|d) below the threshold will be excluded from the resulting Theta matrix.

GetScoreValueArgs¶

Represents an argument of get score operation.

message GetScoreValueArgs {
  optional string model_name = 1;
  optional string score_name = 2;
  optional Batch batch = 3;
}

GetScoreValueArgs.model_name¶: The name of the model to retrieved score for.

GetScoreValueArgs.score_name¶: The name of the score to retrieved.

GetScoreValueArgs.batch¶: The Batch to calculate the score. This option is only applicable to cumulative scores. When not provided the score will be reported for all batches processed since last ArtmInvokeIteration().

AddBatchArgs¶

Represents an argument of ArtmAddBatch() operation.

message AddBatchArgs {
  optional Batch batch = 1;
  optional int32 timeout_milliseconds = 2 [default = -1];
  optional bool reset_scores = 3 [default = false];
  optional string batch_file_name = 4;
}

AddBatchArgs.batch¶: The Batch to add.

AddBatchArgs.timeout_milliseconds¶: Timeout in milliseconds for this operation.

AddBatchArgs.reset_scores¶: An optional flag that defines whether to reset all scores before this operation.

AddBatchArgs.batch_file_name¶: An optional value that defines disk location of the batch to add. You must choose between parameters batch_file_name or batch (either of them has to be specified, but not both at the same time).

InvokeIterationArgs¶

Represents an argument of ArtmInvokeIteration() operation.

message InvokeIterationArgs {
  optional int32 iterations_count = 1 [default = 1];
  optional bool reset_scores = 2 [default = true];
  optional string disk_path = 3;
}

InvokeIterationArgs.iterations_count¶: An integer value describing how many iterations to invoke.

InvokeIterationArgs.reset_scores¶: An optional flag that defines whether to reset all scores before this operation.

InvokeIterationArgs.disk_path¶: A value that defines the disk location with batches to process on this iteration.

WaitIdleArgs¶

Represents an argument of ArtmWaitIdle() operation.

message WaitIdleArgs {
  optional int32 timeout_milliseconds = 1 [default = -1];
}

WaitIdleArgs.timeout_milliseconds¶: Timeout in milliseconds for this operation.

ExportModelArgs¶

Represents an argument of ArtmExportModel() operation.

message ExportModelArgs {
  optional string file_name = 1;
  optional string model_name = 2;
}

ExportModelArgs.file_name¶: A target file name where to store topic model.

ExportModelArgs.model_name¶: A value that describes the name of the topic model. This name will match the name of the corresponding model config.

ImportModelArgs¶

Represents an argument of ArtmImportModel() operation.

message ImportModelArgs {
  optional string file_name = 1;
  optional string model_name = 2;
}

ImportModelArgs.file_name¶: A target file name from where to load topic model.

ImportModelArgs.model_name¶: A value that describes the name of the topic model. This name will match the name of the corresponding model config.