BigARTM FAQ¶
Can I use BigARTM from other programming languages (not Python)?¶
Yes, as long as your language has an implementation of Google Protocol Buffers (the list can be found here). Note that Google officially supports C++, Python and Java.
The following figure shows how to call BigARTM methods directly
on artm.dll
(Windows) or artm.so
(Linux).

To write your API please refer to Plain C interface of BigARTM.
How to retrieve Theta matrix from BigARTM¶
Theta matrix is a matrix that contains the distribution of several items (columns of the matrix) into topics (rows of the matrix). There are three ways to retrieve such information from BigARTM, and the correct way depends on your scenario.
You want to get Theta matrix for the same collection as you have used to infer the topic model.
Set
MasterComponentConfig.cache_theta
to true prior to the last iteration, and after the iteration useMasterComponent::GetThetaMatrix()
(in C++) orMasterComponent.GetThetaMatrix
(in Python) to retrieve Theta matrix.You want to repeatedly monitor a small portion of the Theta matrix during ongoing iterations.
In this case you should create Theta Snippet score, defined via ThetaSnippetScoreConfig, and then use
MasterComponent::GetScoreAs<T>()
to retrieve the resulting ThetaSnippetScore message.This configuration of Theta Snippet score require you to provide
ThetaSnippetScoreConfig.item_id
listing all IDs of the items that should have Theta’s collected. If you created the batches manually you should have specified such IDs inItem.id
field. If you used other methods to parse the collection from disk then you shouldt try using sequential IDs, starting with 1.Remember that Theta snippet score is designed to handle only a small number of items. Attemp to retrieve 100+ items will have a negative effect on performance.
You want to classify a new set of items with an existing model.
In this case you need to create a Batch, containing your new items. Then copy this batch to
GetThetaMatrixArgs.batch
message, specifyGetThetaMatrixArgs.model_name
, and useMasterComponent::GetThetaMatrix()
(in C++) orMasterComponent.GetThetaMatrix
(in Python) to retrieve Theta matrix. In this case there is no need setMasterComponentConfig.cache_theta
to true.
Check example11_get_theta_matrix.py for further examples.