Changes in BigARTM CLI¶
v0.9.0¶
- added option
--cooc-window
to set width of window in which tokens are considered to occurr together. - added option
--cooc-min-tf
to set minimal value of absolute co-occurrence of tokens. - added option
--cooc-min-df
to set minimal value of documental frequency of token co-occurrerence. - added option
--write-cooc-tf
to set path of output file with absolute co-occurrences. - added option
--write-cooc-df
to set path of output file with documental frequency of token co-occurrences. - added option
--write-ppmi-tf
to set path of output file with ppmi’s calculated on base of absolute co-occurrences. - added option
--write-ppmi-df
to set path of output file with ppmi’s calculated on base of documental frequences of token co-occurrences.
v0.8.2¶
- added option
--rand-seed
to initialize random number generator; without this options, RNG will be set using system time - added option
--write-vw-corpus
to convert batches into plain text file in Vowpal Wabbit format - change the naming scheme of the batches, saved with
--save-batches
option. Previously file names were guid-based, while new format will look like this:aabcde.batch
. New format ensures the ordering of the documents in the collection is be preserved, given that user scans batches alphabetically. - added switch
--guid-batch-name
to enable old naming scheme of batches (guid-based names). This option is useful if you launch multiple instances of BigARTM CLI to concurrently generate batches. - speedup parsing large files in VowpalWabbit format
- when
--use-modality
is specified, the batches saved with--save-batches
will only include tokens from these modalities. Other tokens will be ignored during parsing. This option is implemented for both VW and UCI BOW formats. - implement
TopicSelection
,LabelRegularization
,ImproveCoherence
,Biterms
regularizer in BigARTM CLI - added option
--dictionary-size
to give user an easy option to limit his dictionary size - add more diagnostics information about dictionary size (before and after filtering)
- add strict verification of scores and regularizers; for example, BigARTM CLI will raise an exception for this input:
bigartm -t obj:10,back:5 --regularizer "0.5 SparsePhi #obj*"
. There shouldn’t be star sign in#obj*
.
v0.8.0¶
- renamed
--passes
into--num-collection-passes
- renamed
--num-inner-iterations
into--num-document-passes
- removed
--model-v06
option - removed
--use-dense-bow
option