Dictionary

This page describes Dictionary class.

class artm.Dictionary(name=None, dictionary_path=None, data_path=None)
__init__(name=None, dictionary_path=None, data_path=None)
Parameters:
  • name (str) – name of the dictionary
  • dictionary_path (str) – can be used for default call of load() method in constructor
  • data_path (str) – can be used for default call of gather() method in constructor

Note: all parameters are optional

copy()
Description:returns a copy the dictionary loaded in lib with another name.
create(dictionary_data)
Description:creates dictionary using DictionaryData object
Parameters:dictionary_data (DictionaryData instance) – configuration of dictionary
filter(class_id=None, min_df=None, max_df=None, min_df_rate=None, max_df_rate=None, min_tf=None, max_tf=None)
Description:

filters the BigARTM dictionary of the collection, which was already loaded into the lib

Parameters:
  • dictionary_name (str) – name of the dictionary in the lib to filter
  • dictionary_target_name (str) – name for the new filtered dictionary in the lib
  • class_id (str) – class_id to filter
  • min_df (float) – min df value to pass the filter
  • max_df (float) – max df value to pass the filter
  • min_df_rate (float) – min df rate to pass the filter
  • max_df_rate (float) – max df rate to pass the filter
  • min_tf (float) – min tf value to pass the filter
  • max_tf (float) – max tf value to pass the filter
Note:

the current dictionary will be replaced with filtered

gather(data_path, cooc_file_path=None, vocab_file_path=None, symmetric_cooc_values=False)
Description:

creates the BigARTM dictionary of the collection, represented as batches and load it in the lib

Parameters:
  • data_path (str) – full path to batches folder
  • cooc_file_path (str) – full path to the file with cooc info
  • vocab_file_path (str) – full path to the file with vocabulary. If given, the dictionary token will have the same order, as in this file, otherwise the order will be random
  • symmetric_cooc_values (bool) – if the cooc matrix should considered to be symmetric or not
load(dictionary_path)
Description:loads the BigARTM dictionary of the collection into the lib
Parameters:dictionary_path (str) – full filename of the dictionary
load_text(dictionary_path, encoding='utf-8')
Description:

loads the BigARTM dictionary of the collection from the disk in the human readable text format

Parameters:
  • dictionary_path (str) – full file name of the text dictionary file
  • encoding (str) – an encoding of text in diciotnary
save(dictionary_path)
Description:saves the BigARTM dictionary of the collection on the disk
Parameters:dictionary_path (str) – full file name for the dictionary
save_text(dictionary_path, encoding='utf-8')
Description:

saves the BigARTM dictionary of the collection on the disk in the human readable text format

Parameters:
  • dictionary_path (str) – full file name for the text dictionary file
  • encoding (str) – an encoding of text in diciotnary