Basic BigARTM tutorial for Linux and Mac OS-X users

Currently there is no distribution package of BigARTM for Linux. BigARTM had been tested on several Linux OS, and it is known to work well, but you have to get the source code and compile it locally on your machine.

Download sources and build

Clone the latest BigARTM code from our github repository, and build it via CMake as in the following script.

sudo apt-get install git make cmake build-essential libboost-all-dev
cd ~
git clone --branch=stable https://github.com/bigartm/bigartm.git
cd bigartm
mkdir build && cd build
cmake ..
make

Running BigARTM from command line

There is a simple utility bigartm, which allows you to run BigARTM from command line. To experiment with this tool you need a small dataset, which you can get via the following script. More datasets are available through Downloads page.

cd ~/bigartm
mkdir datasets && cd datasets
wget https://s3-eu-west-1.amazonaws.com/artm/docword.kos.txt.gz
wget https://s3-eu-west-1.amazonaws.com/artm/vocab.kos.txt
gunzip docword.kos.txt.gz
../build/src/bigartm/bigartm -d docword.kos.txt -v vocab.kos.txt

Configure BigARTM Python API

For more advanced scenarios you need to configure Python interface for BigARTM. To use BigARTM from Python you need to use Google Protobuf. We recommend to use ‘protobuf 2.5.1-pre’, included in bigartm/3rdparty.

# Step 1 - add BigARTM python bindings to PYTHONPATH
export PYTHONPATH=~/bigartm/python:$PYTHONPATH

# Step 2 - install google protobuf
cd ~/bigartm
cp build/3rdparty/protobuf-cmake/protoc/protoc 3rdparty/protobuf/src/
cd 3rdparty/protobuf/python
python setup.py build
sudo python setup.py install

# Step 3 - point ARTM_SHARED_LIBRARY variable to libartm.so (libartm.dylib) location
export ARTM_SHARED_LIBRARY=~/bigartm/build/src/artm/libartm.so        # for linux
export ARTM_SHARED_LIBRARY=~/bigartm/build/src/artm/libartm.dylib     # for Mac OS X

At this point you may run examples under ~/bigartm/python/examples.

Troubleshooting

>python setup.py build
  File "setup.py", line 52
        print "Generating %s..." % output

SyntaxError: Missing parentheses in call to `print`

This error may happen during google protobuf installation. It indicates that you are using Python 3, which is not supported by BigARTM. (see this question on StackOverflow for more details on the error around print). Please use Python 2.7.9 to workaround this issue.

ubuntu@192.168.0.1:~/bigartm/python/examples$ python example01_synthetic_collection.py
Traceback (most recent call last):
  File "example01_synthetic_collection.py", line 6, in <module>
    import artm.messages_pb2, artm.library, random, uuid
ImportError: No module named artm.messages_pb2

This error indicate that python is unable to locate messages_pb2.py and ``library.py files. Please verify if you executed Step #1 in the instructions above.

ubuntu@192.168.0.1:~/bigartm/python/examples$ python example01_synthetic_collection.py
Traceback (most recent call last):
  File "example01_synthetic_collection.py", line 6, in <module>
    import artm.messages_pb2, artm.library, random, uuid
  File "/home/ubuntu/bigartm/python/messages_pb2.py", line 4, in <module>
    from google.protobuf import descriptor as _descriptor
ImportError: No module named google.protobuf

This error indicated that python is unable to locate protobuf library. Please verify if you executed Step #2 in the instructions above. If you do not have permissions to execute sudo python setup.py install step, you may also try to update PYTHONPATH manually: PYTHONPATH="/home/ubuntu/bigartm/3rdparty/protobuf/python:/home/ubuntu/bigartm/python:$PYTHONPATH".

ubuntu@192.168.0.1:~/bigartm/python/examples$ python example01_synthetic_collection.py
libartm.so: cannot open shared object file: No such file or directory,
fall back to ARTM_SHARED_LIBRARY environment variable
Traceback (most recent call last):
  File "example01_synthetic_collection.py", line 27, in <module>
    with artm.library.MasterComponent() as master:
  File "/home/ubuntu/bigartm/python/artm/library.py", line 179, in __init__
    lib = Library().lib_
  File "/home/ubuntu/bigartm/python/artm/library.py", line 107, in __init__
    self.lib_ = ctypes.CDLL(os.environ['ARTM_SHARED_LIBRARY'])
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'ARTM_SHARED_LIBRARY'

This error indicate that BigARTM’s python interface can not locate libartm.so (libartm.dylib) files. Please verify if you executed Step #3 correctly.

BigARTM on Travis-CI

To get a live usage example of BigARTM you may check BigARTM’s .travis.yml script and the latest continuous integration build.