Segmentation is done by using a library called MeCab. MeCab is morphological analysis engine that was developed through open source. It can be used with php for Japanese word processing. MeCab can also be used with other programming languages like Java, Python etc..
For linux,we have to install c++ compiler, MeCab, a dictionary for language processing, php development module for building php functions and php extension for MeCab.
1. Install gcc
> sudo yum install gcc-c++ (fedora) / > sudo apt-get install gcc-c++ (Ubuntu)
2. Download and install MeCab
Create a folder and download the libraries to it.
> mkdir Download
> cd Download
> wget http://downloads.sourceforge.net/project/mecab/mecab/0.98/mecab-0.98.tar.gz
> tar xvzf mecab-0.98.tar.gz
> cd mecab-0.98
> ./configure
> make
> sudo make install
> cd ..
3. Download and install a MeCab dictionary (ipadic)
> wget http://sourceforge.net/projects/mecab/files/mecab-ipadic/2.7.0-20070801/mecab-ipadic-2.7.0-20070801.tar.gz
> tar xvfz mecab-ipadic-2.7.0-20070801.tar.gz
> cd mecab-ipadic-2.7.0-20070801
> ./configure --with-charset=utf8
> make
> sudo make install
> cd ..
Check mecab version, make sure it installed by using the below command:
> /usr/local/bin/mecab -v
It will show the version number then the installation is ok.If the error :
mecab: error while loading shared libraries: libmecab.so.1: cannot open shared object file: No such file or directory
ipadic is not installed properly, re-install it.
4. Install php development module
> sudo yum install php-devel (Fedora) / > sudo apt-get install php5-dev (Ubuntu)
5. Download and install php extension for Mecab
Downoad php-mecab from https://github.com/rsky/php-mecab/archives/master , unzip and install it.
> tar xfvz rsky-php-mecab-4193188.tar.gz
> cd rsky-php-mecab-4193188/
> phpize
> ./configure --with-php-config=/usr/bin/php-config --with-mecab=/usr/local/bin/mecab-config
> make
> sudo make install
6. Enable the MeCab
Add the following in php.ini and restart Apache
extension = mecab.so
Testing the MeCab
<?php
error_reporting(-1);
if (extension_loaded(mecab))
echo "mecab loaded :)";
else
echo "something is wrong :(";
$str = "また、Tagger は Stream をくるくる回すのではなく、一括で文字列を解析するようなので、一旦";
$result = mecab_split($str);
print_r($result);
?>
Output
The avilable php-mecab exteion functions are listed in http://mechsys.tec.u-ryukyu.ac.jp/~oshiro/php_mecab_apis.html
Referances
http://www.programming-magic.com/20080808173652/
http://wiki.jdictionary.com/Building_Mecab_For_PHP
http://mechsys.tec.u-ryukyu.ac.jp/~oshiro/php_mecab_apis.html#mecab_split
https://github.com/rsky/php-mecab/tree/
http://tips.recatnap.info/about_mecab_extension_php/
YES!!! i did it! after few hours of brain...
ReplyDeletewhatever..
i wanna say thanx to you, but, if you get an
error during install ipadic
mecab: error while loading shared libraries: libmecab.so.1: cannot open shared object file:
you must do this
go to
/etc/d.so.conf
edit it
and add this string
"/usr/ local /lib"
after first include
than
"sudo ldconfig"
and try again to reinstall ipadic
huh...
The path above to edit should actually be /etc/ld.so.conf.
ReplyDeleteMore info can be found here: http://kooj.blog102.fc2.com/blog-entry-24.html