BigARTM FAQ

Can I use BigARTM from other programming languages (not Python)?

Yes, as long as your language has an implementation of Google Protocol Buffers (the list can be found here). Note that Google officially supports C++, Python and Java.

The following figure shows how to call BigARTM methods directly on artm.dll (Windows) or artm.so (Linux).

Connecting to BigARTM from different languages

To write your API please refer to Plain C interface of BigARTM.

How to retrieve Theta matrix from BigARTM

Theta matrix is a matrix that contains the distribution of several items (columns of the matrix) into topics (rows of the matrix). There are three ways to retrieve such information from BigARTM, and the correct way depends on your scenario.

  1. You want to get Theta matrix for the same collection as you have used to infer the topic model.

    Set MasterComponentConfig.cache_theta to true prior to the last iteration, and after the iteration use MasterComponent::GetThetaMatrix() (in C++) or MasterComponent.GetThetaMatrix (in Python) to retrieve Theta matrix. Note that cach_theta flag does not work together with network modus operandi. Current workaround for the network modus operandi is to use the option no. 3 (described below).

  2. You want to repeatedly monitor a small portion of the Theta matrix during ongoing iterations.

    In this case you should create Theta Snippet score, defined via ThetaSnippetScoreConfig, and then use MasterComponent::GetScoreAs<T>() to retrieve the resulting ThetaSnippetScore message.

    This configuration of Theta Snippet score require you to provide ThetaSnippetScoreConfig.item_id listing all IDs of the items that should have Theta’s collected. If you created the batches manually you should have specified such IDs in Item.id field. If you used other methods to parse the collection from disk then you shouldt try using sequential IDs, starting with 1.

    Remember that Theta snippet score is designed to handle only a small number of items. Attemp to retrieve 100+ items will have a negative effect on performance.

  3. You want to classify a new set of items with an existing model.

    In this case you need to create a Batch, containing your new items. Then copy this batch to GetThetaMatrixArgs.batch message, specify GetThetaMatrixArgs.model_name, and use MasterComponent::GetThetaMatrix() (in C++) or MasterComponent.GetThetaMatrix (in Python) to retrieve Theta matrix. In this case there is no need set MasterComponentConfig.cache_theta to true.

Check example11_get_theta_matrix.py for further examples.