BigARTM v0.7.2 Release notes¶
We are happy to introduce BigARTM v0.7.2, which brings you the following changes:
- Enhancements in high-level python API (
ArtmModel
->ARTM
) - Enhancements in low-level python API (
library.py
->master_component.py
) - Enhancements in CLI interface (
cpp_client
) - Status and information retrievals from BigARTM
- Allow float token counts (
token_count
->token_weight
) - Allow custom weights for each batch (
ProcessBatchesArgs.batch_weight
) - Bug fixes and cleanup in the online documentation
Enhancements in Python APIs¶
Note that ArtmModel
had been renamed to ARTM
.
The naming conventions follow the same pattern as in scikit learn
(e.g. fit
, transform
and fit_transform
methods).
Also note that all input data is now handled by BatchVectorizer
class.
Refer to noteboods in English
and in Russian
for further details about ARTM
interface.
Also note that previous low-level python API library.py
is superseeded by a new API master_component.py
.
For now both APIs are available, but the old one will be removed in future releases.
Refer to this folder for futher examples of the new low-level python API.
Remember that any use of low-level APIs is discouraged. Our recommendation is to always use the high-level python API ARTM
,
and e-mail us know if some functionality is not exposed there.
Enhancements in CLI interface¶
BigARTM command line interface cpp_client
had been enhanced with the following options:
--load_model
- to load model from file before processing--save_model
- to save the model to binary file after processing--write_model_readable
- to output the model in a human-readable format (CSV)--write_predictions
- to write prediction in a human-readable format (CSV)--dictionary_min_df
- to filter out tokens present in less than N documents / less than P% of documents--dictionary_max_df
- filter out tokens present in less than N documents / less than P% of documents--tau0
- an option of the online algorith, describing the weight parameter in the online update formula. Optional, defaults to1024
.--kappa
- an option of the online algorithm, describing the exponent parameter in the online update formula. Optional, defaults to0.7
.
Note that for --dictionary_min_df
and --dictionary_max_df
can be treated as number, fraction, percent.
- Use a percentage
%
sign to specify percentage value - Use a floating value in
[0, 1)
range to specify a fraction - Use an integer value (
1
or greater) to indicate a number