Python Interface¶
This document explains all classes in python interface of BigARTM library.
Library¶
-
class
artm.library.
Library
(artm_shared_library = "")¶ Creates an ArtmLibrary object, wrapping the BigARTM shared library.
The artm_shared_library is an optional argument, which provides full file name of artm shared library (a disk path plus
artm.dll
on Windows orartm.so
on Linux). When artm_shared_library is not specified the shared library will be searched in folders listed inPATH
system variable. You may also configureARTM_SHARED_LIBRARY
system variable to provide full file name of artm shared library.-
CreateMasterComponent
(config = None)¶ Creates and returns an instance of
MasterComponent
class. config defines an optional MasterComponentConfig parameter that may carry the configuration of the master component.
-
ParseCollection
(collection_parser_config)¶ Parses a text collection as defined by collection_parser_config (CollectionParserConfig). Returns an instance of DictionaryConfig which carry all unique words in the collection and their frequencies.
For more information refer to
ArtmRequestParseCollection()
andArtmRequestLoadDictionary()
.
-
LoadDictionary
(full_filename)¶ Loads a DictionaryConfig from the file, defined by full_filename argument.
For more information refer to
ArtmRequestLoadDictionary()
.
-
LoadBatch
(full_filename)¶ Loads a Batch from the file, defined by full_filename argument.
For more information refer to
ArtmRequestLoadBatch()
.
-
ParseCollectionOrLoadDictionary
(docword_file_path, vocab_file_path, target_folder)¶ A simple helper method that runs
ParseCollection()
when target_folder is empty, otherwise tried to useLoadDictionary()
to load the dictionary from target_folder.The docword_file_path and vocab_file_path arguments should provide the disk location of docword and vocab files of the collection to be parsed.
-
MasterComponent¶
-
class
artm.library.
MasterComponent
(config = None, lib = None, disk_path = None)¶ Creates a master component.
config is an optional instance of MasterComponentConfig, providing an initial configuration of the master component.
lib is an optional argument pointing to
Library
. When not specified, a default library will be used. Check the constructor ofLibrary
for more details.disk_path is an optional value providing the disk folder with batches to process by this master component. Changing disk_path is not supported (you must recreate a new instance MasterComponent to do so). Use
InvokeIteration()
will process all batches, located under disk_path. Alternatively useAddBatch()
to add a specific batch into processor queue.-
Dispose
()¶ Disposes the master component and releases all unmanaged resources.
-
config
()¶ Returns current MasterComponentConfig of the master component.
-
CreateModel(config=None, topics_count=None, inner_iterations_count=None, class_ids=None, class_weights=None,
-
topic_names=None, use_sparse_format=None, request_type=None)
Creates and returns an instance of
Model
class based on a given ModelConfig. Note that the model has to be further tuned by several iterative scans over the text collection. UseInvokeIteration()
to perform such scans.All parameters will override values, specifed in config.
-
RemoveModel
(model)¶ Removes an instance of
Model
from the master component. After this operation the model object became invalid and must not be used.
-
CreateRegularizer
(name, type, config)¶ Creates and returns an instance of
Regularizer
component. name can be any unique identifier, that you can further use to identify regularizer (for example, inModelConfig.regularizer_name
). type can be any regularizer type (for example, theRegularizerConfig_Type_DirichletTheta
). config can be any regularizer config (for example, a SmoothSparseThetaConfig).
-
CreateSmoothSparseThetaRegularizer
(name = None, config = None)¶ Creates an instance of SmoothSparseThetaRegularizer. config is an optional argument of SmoothSparseThetaConfig type.
-
CreateSmoothSparsePhiRegularizer
(name = None, config = None, topic_names=None, class_ids=None)¶ Creates an instance of SmoothSparsePhiRegularizer. config is an optional argument of SmoothSparsePhiConfig type.
-
CreateDecorrelatorPhiRegularizer
(name = None, config = None, topic_names=None, class_ids=None)¶ Creates an instance of DecorrelatorPhiRegularizer. config is an optional argument of DecorrelatorPhiConfig type.
-
RemoveRegularizer
(regularizer)¶ Removes an instance of
Regularizer
from the master component. After this operation the regularizer object became invalid and must not be used.
-
CreateScore
(name, type, config)¶ Creates a score calculator inside the master component. name can be any unique identifier, that you can further use to identify the score (for example, in
ModelConfig.score_name
). type can be any score type (for example, theScoreConfig_Type_Perplexity
). config can be any score config (for example, a PerplexityScoreConfig).
-
CreatePerplexityScore
(self, name = None, config = None, stream_name = None, class_ids=None)¶ Creates an instance of PerplexityScore. config is an optional argument of PerplexityScoreConfig type.
-
CreateSparsityThetaScore
(self, name = None, config = None, topic_names=None)¶ Creates an instance of SparsityThetaScore. config is an optional argument of SparsityThetaScoreConfig type.
-
CreateSparsityPhiScore
(self, name = None, config = None, topic_names=None, class_id=None)¶ Creates an instance of SparsityPhiScore. config is an optional argument of SparsityPhiScoreConfig type.
-
CreateItemsProcessedScore
(self, name = None, config = None)¶ Creates an instance of ItemsProcessedScore. config is an optional argument of ItemsProcessedScoreConfig type.
-
CreateTopTokensScore
(self, name = None, config = None, num_tokens = None, class_id = None, topic_names=None)¶ Creates an instance of TopTokensScore. config is an optional argument of TopTokensScoreConfig type.
-
CreateThetaSnippetScore
(self, name = None, config = None)¶ Creates an instance of ThetaSnippetScore. config is an optional argument of ThetaSnippetScoreConfig type.
-
CreateTopicKernelScore
(self, name = None, config = None, topic_names=None, class_id=None)¶ Creates an instance of TopicKernelScore. config is an optional argument of TopicKernelScoreConfig type.
-
RemoveScore
(name)¶ Removes a score calculator with the specific name from the master component.
-
CreateDictionary
(config)¶ Creates and returns an instance of
Dictionary
class component with a specific DictionaryConfig.
-
RemoveDictionary
(dictionary)¶ Removes an instance of
Dictionary
from the master component. After this operation the dictionary object became invalid and must not be used.
-
Reconfigure
(config = None)¶ Updates the configuration of the master component with new MasterComponentConfig value, provided by config parameter. Remember that some changes of the configuration are not allowed (for example, the
MasterComponentConfig.disk_path
must not change). Such configuration parameters must be provided in the constructor ofMasterComponent
.
-
AddBatch
(self, batch = None, batch_filename = None, timeout = None, reset_scores = False, args=None)¶ Adds an instance of Batch class to the processor queue. Master component creates a copy of the batch, so any further changes of the batch object will not be picked up. batch_filename is an alternative to file with binary-serialized batch (you must use either batch or batch_filename option, but not both at the same time).
This operation awaits until there is enough space in processor queue. It returns True if await succeeded within the timeout, otherwise returns False. The provided timeout is in milliseconds. By default it allows an infinite time for
AddBatch()
operation.args is an optional argument of AddBatchArgs type.
-
InvokeIteration
(iterations_count = 1, disk_path = None, args=None)¶ Invokes several iterations over the collection. The recommended value for iterations_count is 1. disk_path defines the disk location with batches to process on this iteration. For more iterations use for loop around
InvokeIteration()
method. This operation is asynchronous. UseWaitIdle()
to await until all iterations succeeded.args is an optional argument of InvokeIterationArgs type.
-
WaitIdle
(timeout = None, args=None)¶ Awaits for ongoing iterations. Returns True if iterations had been finished within the timeout, otherwise returns False. The provided timeout is in milliseconds. Use timeout = -1 to allow infinite time for
WaitIdle()
operation. Remember to callModel.Synchronize()
operation to synchronize each model that you are currently processing.args is an optional argument of WaitIdleArgs type.
-
RemoveStream
(stream_name)¶ Removes a stream with the specific name from the master component.
-
GetTopicModel
(model = None, args = None)¶ Retrieves and returns an instance of TopicModel class, carrying all the data of the topic model (including the Phi matrix). Parameter model should be an instance of
Model
class. For more settings use args parameter (see GetTopicModelArgs for all available options).
-
GetRegularizerState
(regularizer_name)¶ Retrieves and returns the internal state of a regularizer with the specific name.
-
GetThetaMatrix
(model = None, batch = None, clean_cache = None, args = None)¶ Retrieves an instance of ThetaMatrix class. The content depends on batch parameter. When batch is provided, the resulting ThetaMatrix will contain theta values estimated for all documents in the batch. When batch is not provided, the resulting ThetaMatrix will contain theta values gathered during the last iteration.
Parameter model should be an instance of
Model
class. For more settings use args parameter (see GetThetaMatrixArgs for all available options).When used without batch, this operation require
MasterComponentConfig.cache_theta
to be set to True before starting the last iteration. In this case the entire ThetaMatrix must fit into CPU memory, and for this reasonMasterComponentConfig.cache_theta
is turned off by default.
-
Model¶
-
class
artm.library.
Model
¶ This constructor must not be used explicitly. The only correct way of creating a Model is through
MasterComponent.CreateModel()
method.-
name
()¶ Returns the string name of the model.
-
Reconfigure
(config = None)¶ Updates the configuration of the topic model with new ModelConfig value, provided by config parameter. When config is not specified the configuration is updated with
config()
value. Remember that some changes of the configuration are applied immediately after this call. For example, changes toModelConfig.topics_count
orModelConfig.topic_name
will be applied only during the nextSynchronize
call.Note that changes
ModelConfig.topics_count
orModelConfig.topic_name
are only supported on an idle master component (e.g. in between iterations). Changing these values during an ongoing iteration may cause unexpected results.
-
topics_count
()¶ Returns the number of topics in the model.
-
config
()¶ Returns current ModelConfig of the topic model.
-
Synchronize
(decay_weight = 0.0, apply_weight = 1.0, invoke_regularizers = True, args=None)¶ This operation updates the Phi matrix of the topic model with all model increments, collected since the last call to
Synchronize()
method. The Phi matrix is calculated according to decay_weight and apply_weight (refer toSynchronizeModelArgs.decay_weight
for more details). Depending on invoke_regularizers parameter this operation may also invoke all regularizers.Remember to call
Synchronize()
operation every time after callMasterComponent.WaitIdle()
.For more settings use args parameter (see SynchronizeModelArgs for all available options).
-
Initialize
(dictionary = None, args=None)¶ Generates a random initial approximation for the Phi matrix of the topic model.
dictionary must be an instance of
Dictionary
class.For more settings use args parameter (see InitializeModelArgs for all available options).
-
Export
(filename)¶ Exports topic model into a file.
-
Import
(filename)¶ Imports topic model from a file.
-
Overwrite
(topic_model, commit = True)¶ Updates the model with new Phi matrix, defined by topic_model (TopicModel). This operation can be used to provide an explicit initial approximation of the topic model, or to adjust the model in between iterations.
Depending on the commit flag the change can be applied immediately (commit = true) or queued (commit = false). The default setting is to use commit = true. You may want to use commit = false if your model is too big to be updated in a single protobuf message. In this case you should split your model into parts, each part containing subset of all tokens, and then submit each part in separate Overwrite operation with commit = false. After that remember to call
MasterComponent.WaitIdle()
andModel.Synchronize()
to propagate your change.
-
Enable
()¶ Sets
ModelConfig.enabled
to True for the current topic model. This means that the model will be updated onMasterComponent.InvokeIteration()
.
-
EnableScore
(score)¶ By default model does calculate any scores even if they are created with
MasterComponent.CreateScore()
. Method EnableScore tells to the model that score should be applied to the model. Parameter tau defines the regularization coefficient of the regularizer. score must be an instance ofScore
class.
-
EnableRegularizer
(regularizer, tau)¶ By default model does not use any regularizers even if they are created with
MasterComponent.CreateRegularizer()
. Method EnableRegularizer tells to the model that regularizer should be applied to the model. Parameter tau defines the regularization coefficient of the regularizer. regularizer must be an instance ofRegularizer
class.
-
Disable
()¶ Sets
ModelConfig.enabled
to False` for the current topic model. This means that the model will not be updated onMasterComponent.InvokeIteration()
, but the the scores for the model still will be collected.
-
Regularizer¶
-
class
artm.library.
Regularizer
¶ This constructor must not be used explicitly. The only correct way of creating a Regularizer is through
MasterComponent.CreateRegularizer()
method (or similar methods inMasterComponent
class, dedicated to a particular type of the regularizer).-
name
()¶ Returns the string name of the regularizer.
-
Reconfigure
(type, config)¶ Updates the configuration of the regularizer with new regularizer configuration, provided by config parameter. The config object can be, for example, of SmoothSparseThetaConfig type (or similar). The type must match the current type of the regularizer.
-
Score¶
-
class
artm.library.
Score
¶ This constructor must not be used explicitly. The only correct way of creating a Score is through
MasterComponent.CreateScore()
method (or similar methods inMasterComponent
class, dedicated to a particular type of the score).-
name
()¶ Returns the string name of the score.
-
GetValue
(model = None, batch = None)¶ Retrieves the score for a specific model. For cumulative scores such as Perplexity of ThetaSparsity score it is possible to use batch argument.
-
Dictionary¶
-
class
artm.library.
Dictionary
(master_component, config)¶ This constructor must not be used explicitly. The only correct way of creating a Dictionary is through
MasterComponent.CreateDictionary()
method.-
name
()¶ Returns the string name of the dictionary.
-
Reconfigure
(config)¶ Updates the configuration of the dictionary with new DictionaryConfig value, provided by config parameter.
-
Visualizers¶
Exceptions¶
-
exception
artm.library.
InternalError
¶ An exception class corresponding to
ARTM_INTERNAL_ERROR
error code.
-
exception
artm.library.
ArgumentOutOfRangeException
¶ An exception class corresponding to
ARTM_ARGUMENT_OUT_OF_RANGE
error code.
-
exception
artm.library.
InvalidMasterIdException
¶ An exception class corresponding to
ARTM_INVALID_MASTER_ID
error code.
-
exception
artm.library.
CorruptedMessageException
¶ An exception class corresponding to
ARTM_CORRUPTED_MESSAGE
error code.
-
exception
artm.library.
InvalidOperationException
¶ An exception class corresponding to
ARTM_INVALID_OPERATION
error code.
-
exception
artm.library.
DiskReadException
¶ An exception class corresponding to
ARTM_DISK_READ_ERROR
error code.
-
exception
artm.library.
DiskWriteException
¶ An exception class corresponding to
ARTM_DISK_WRITE_ERROR
error code.
Constants¶
-
artm.library.
Stream_Type_Global
¶
-
artm.library.
Stream_Type_ItemIdModulus
¶
-
artm.library.
RegularizerConfig_Type_DirichletTheta
¶
-
artm.library.
RegularizerConfig_Type_DirichletPhi
¶
-
artm.library.
RegularizerConfig_Type_SmoothSparseTheta
¶
-
artm.library.
RegularizerConfig_Type_SmoothSparsePhi
¶
-
artm.library.
RegularizerConfig_Type_DecorrelatorPhi
¶
-
artm.library.
ScoreConfig_Type_Perplexity
¶
-
artm.library.
ScoreData_Type_Perplexity
¶
-
artm.library.
ScoreConfig_Type_SparsityTheta
¶
-
artm.library.
ScoreData_Type_SparsityTheta
¶
-
artm.library.
ScoreConfig_Type_SparsityPhi
¶
-
artm.library.
ScoreData_Type_SparsityPhi
¶
-
artm.library.
ScoreConfig_Type_ItemsProcessed
¶
-
artm.library.
ScoreData_Type_ItemsProcessed
¶
-
artm.library.
ScoreConfig_Type_TopTokens
¶
-
artm.library.
ScoreData_Type_TopTokens
¶
-
artm.library.
ScoreConfig_Type_ThetaSnippet
¶
-
artm.library.
ScoreData_Type_ThetaSnippet
¶
-
artm.library.
ScoreConfig_Type_TopicKernel
¶
-
artm.library.
ScoreData_Type_TopicKernel
¶
-
artm.library.
PerplexityScoreConfig_Type_UnigramDocumentModel
¶
-
artm.library.
PerplexityScoreConfig_Type_UnigramCollectionModel
¶
-
artm.library.
CollectionParserConfig_Format_BagOfWordsUci
¶