Basic BigARTM tutorial for Linux and Mac OS-X users¶
Currently there is no distribution package of BigARTM for Linux. BigARTM had been tested on several Linux OS, and it is known to work well, but you have to get the source code and compile it locally on your machine.
Download sources and build¶
Clone the latest BigARTM code from our github repository, and build it via CMake as in the following script.
sudo apt-get install git make cmake build-essential libboost-all-dev
cd ~
git clone https://github.com/bigartm/bigartm.git
cd bigartm
mkdir build && cd build
cmake ..
make
Running BigARTM from command line¶
There is a simple utility cpp_client
, which allows you to run BigARTM from command line.
To experiment with this tool you need a small dataset, which you can get via the following script.
More datasets are available through Downloads page.
cd ~/bigartm
mkdir datasets && cd datasets
wget https://s3-eu-west-1.amazonaws.com/artm/docword.kos.txt.gz && gunzip docword.kos.txt.gz
wget https://s3-eu-west-1.amazonaws.com/artm/vocab.kos.txt
~/bigartm/build/src/cpp_client/cpp_client -d docword.kos.txt -v vocab.kos.txt
Configure BigARTM Python API¶
For more advanced scenarios you need to configure Python interface for BigARTM. To use BigARTM from Python you need to use Google Protobuf. We recommend to use ‘protobuf 2.5.1-pre’, included in bigartm/3rdparty.
# Step 1 - add BigARTM python bindings to PYTHONPATH
export PYTHONPATH=~/bigartm/src/python:$PYTHONPATH
# Step 2 - install google protobuf
cd ~/bigartm
cp build/3rdparty/protobuf-cmake/protoc/protoc 3rdparty/protobuf/src/
cd 3rdparty/protobuf/python
python setup.py build
sudo python setup.py install
# Step 3 - point ARTM_SHARED_LIBRARY variable to libartm.so (libartm.dylib) location
export ARTM_SHARED_LIBRARY=~/bigartm/build/src/artm/libartm.so # for linux
export ARTM_SHARED_LIBRARY=~/bigartm/build/src/artm/libartm.dylib # for Mac OS X
At this point you may run examples under ~/bigartm/src/python/examples
.
Troubleshooting¶
ubuntu@192.168.0.1:~/bigartm/src/python/examples$ python example01_synthetic_collection.py
Traceback (most recent call last):
File "example01_synthetic_collection.py", line 6, in <module>
import artm.messages_pb2, artm.library, random, uuid
ImportError: No module named artm.messages_pb2
This error indicate that python is unable to locate messages_pb2.py and ``library.py
files.
Please verify if you executed Step #1
in the instructions above.
ubuntu@192.168.0.1:~/bigartm/src/python/examples$ python example01_synthetic_collection.py
Traceback (most recent call last):
File "example01_synthetic_collection.py", line 6, in <module>
import artm.messages_pb2, artm.library, random, uuid
File "/home/ubuntu/bigartm/src/python/artm/messages_pb2.py", line 4, in <module>
from google.protobuf import descriptor as _descriptor
ImportError: No module named google.protobuf
This error indicated that python is unable to locate protobuf library.
Please verify if you executed Step #2
in the instructions above.
If you do not have permissions to execute sudo python setup.py install
step, you may also try to update PYTHONPATH manually:
PYTHONPATH="/home/ubuntu/bigartm/3rdparty/protobuf/python:/home/ubuntu/bigartm/src/python:$PYTHONPATH"
.
ubuntu@192.168.0.1:~/bigartm/src/python/examples$ python example01_synthetic_collection.py
libartm.so: cannot open shared object file: No such file or directory, fall back to ARTM_SHARED_LIBRARY environment variable
Traceback (most recent call last):
File "example01_synthetic_collection.py", line 27, in <module>
with artm.library.MasterComponent() as master:
File "/home/ubuntu/bigartm/src/python/artm/library.py", line 179, in __init__
lib = Library().lib_
File "/home/ubuntu/bigartm/src/python/artm/library.py", line 107, in __init__
self.lib_ = ctypes.CDLL(os.environ['ARTM_SHARED_LIBRARY'])
File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
raise KeyError(key)
KeyError: 'ARTM_SHARED_LIBRARY'
This error indicate that BigARTM’s python interface can not locate libartm.so (libartm.dylib) files.
Please verify if you executed Step #3
correctly.
BigARTM on Travis-CI¶
To get a live usage example of BigARTM you may check BigARTM’s .travis.yml script and the latest continuous integration build.