# Introduction
This documentation is intended to describe general configuration and common usage of BI Server, in order to prevent malfunctioning and to recover from failures.
# Access info
BISRV1
DNS: bisrv1.local.giffits.de (opens new window)
IP: 192.168.100.100
SSH: username (administrator), password (shared by Keeper)
ROOT: password (shared by Keeper)BISRV2
DNS: bisrv2.local.giffits.de (opens new window)
IP: 192.168.100.88
SSH: username (administrator), password (shared by Keeper)
ROOT: password (shared by Keeper)
# Installed hardware and software
- Hardware
RAM: 16 Gb
Core(s): 2 GenuineIntel - Operating System:
$ lsb_release -a Distributor ID: Debian Description: Debian GNU/Linux 9.1 (n/a) Release: 9.1 - JAVA:
$ java -version java version "1.7.0_151" OpenJDK Runtime Environment (IcedTea 2.6.11) (7u151-2.6.11-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.151-b01, mixed mode) - Apache2:
$ apachectl -v Server version: Apache/2.4.10 (Debian) Server built: Sep 15 2016 20:44:43 - PHP:
$ php -v PHP 5.6.30-0+deb8u1 (cli) (built: Feb 8 2017 08:50:21) Copyright (c) 1997-2016 The PHP Group Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2016, by Zend Technologies - ImageMagick:
$ identify -version Version: ImageMagick 6.8.9-9 Q16 x86_64 2016-11-26 http://www.imagemagick.org (opens new window) Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC Features: DPC Modules OpenMP Delegates: bzlib cairo djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png rsvg tiff wmf x xml zlib - PostgreSQL:
PostgreSQL 9.4.13 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit This was the free database of choice to store data warehouse, data marts and similar bases. Admin user is “postgres”, so you have become postgres after becoming root, since this user has no password associated. - Pentaho BI-Server:
There is one instance of Pentaho bi-server deployed on:
/srv/bi/biserver-ce/
It uses port 8009 for AJP connections via Apache2 It is configured to use Postgres as system database. - Pentaho Data-Integration:
There are various instances of PDI (eg: Kettle).
Data Warehouse is updated on daily basis using PDI located under:
/srv/bi/data-integration/
You’ll find two more instances that runs processes on demand, and are waiting for requests to process.
This are located under:
/srv/Test/data-integration/
/srv/Live/data-integration/
The one under Test environment is configured to use Test databases connections. While Live connections are defined in the other instance.
Both instances are also running as Carte service, a way of listen for execution requests, saving time devoted to launch a new Java Virtual Machine (JVM) per request.
Carte instance for testing purposes runs in 9090 port and can be accessed from the outside world using this URL:
http://cluster:cluster@192.168.100.100:9090/kettle/status/
Production instance can only be accessed from localhost using:
http://cluster:cluster@localhost:9191/kettle/status
This is why a PHP service was exposed to invoke this services, for both live and test purposes.
# ETL Services
# Docky
There are three services designed for docky specific tasks:
- processCardCode
Source code is located at:
[Live] /var/www/html/live/docky/processCardCode.php
[Test] /var/www/html/test/docky/processCardCode.php
Can be invoked in the form of http request:
[Live] http://bisrv/live/docky/processCardCode.php?cardcode=B1234&tool=caller
[Test] http://bisrv/test/docky/processCardCode.php?cardcode=B1234&tool=caller
It expects one parameter indicating which customer to process. The tool parameter is optional and intended to easy identification of callers in the access log (for instance one tool sending lots of requests per second). - processFile
Source code is located at:
[Live] /var/www/html/live/docky/processFile.php
[Test] /var/www/html/test/docky/processFile.php
Can be invoked in the form of http request:
[Live] http://bisrv/live/docky/processFile.php?path=//gifham002/Logos/Live/...
[Test] http://bisrv/test/docky/processFile.php?path=//gifham002/Logos/Test/...
Unlike previous services designed to synchronize all files from one customer, this will only add the file at hand, saving time and efforts in the process.
It expects the “path” parameter, that must point to a physical file under: //gifham002/Logos/, and asynchronously invoke the corresponding Carte service:
[Live] http://cluster:cluster@localhost:9191/kettle/executeJob/? job=/srv/Live/UpDocky/processFile.kjb&level=Minimal&filename=/local/path/to/file
[Test] http://cluster:cluster@192.168.100.100:9090/kettle/executeJob/? job=/srv/Test/UpDocky/processFile.kjb&level=Minimal&filename=/local/path/to/file
This process assumes that the file doesn’t exist and tries to save time avoiding double checks. So if it is invoked several times with the same argument, it will duplicate info in Docky database with multiple records pointing to the same physical file. # ImageMagick
There are other services designed for image manipulation, given that ImageMagick is already installed and have access to remote folders.
- dockyPreview
Source code is located at:
[Live] /var/www/html/live/imageMagik/dockyPreview.php
[Test] /var/www/html/test/imageMagik/dockyPreview.php
It is intended to generate a Large preview for the document at hand, and in the process produces a small preview.
# Pentaho Server
The Pentaho BI Server (Community Edition), found in /srv/bi/biserver-ce/ will launch automatically
with the system. You can manually stop/start the server with commands:
# /etc/init.d/pentaho stop
# /etc/init.d/pentaho start
Tomcat is the J2EE container that runs pentaho bi-server, and you can read console errors using:
$ tail -f /srv/bi/biserver-ce/tomcat/logs/catalina.out
There is also an audit log that can be used to analyze usage of this service in:
$ tail -f /srv/bi/biserver-ce/pentaho-solutions/system/logs/audit/PentahoAuditLog.log
A specific and detailed Kettle Pentaho's documentation can be found here Kettle Pentaho Documentation
# Scheduled Tasks (Crontab)
There are some tasks that runs regularly and are managed by system crontab. You can see current list using:
crontab -e
- Data warehouse daily synchronization
30 01 * * * /srv/bi/update.sh > /srv/bi/logs/update_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
Every day around 01h30, the/srv/bi/update.shis launched to update local data warehouse, and regenerate multiple data marts that feed reports and dashboards. The entire process is currently lasting until 10h00 in worst cases. - Standskizzen
05 20 * * * /srv/Live/Standskizzen/run.sh
Every day around 20h05, this process looks for new documents in Standskizzen that are not in JPG format to produce a preview. This process sends an email if there is any failure - UPS Dashboard update
00 07 * * * /srv/bi/ups.sh > /srv/bi/logs/ups_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
Every day around 07h00, this process rewrites the data file used to feed UPS dashboard. - DockyDashboard Update
15 08,10,12,14,16,18,20 * * 1-5 /srv/Live/DockyDashboard/run.sh
Regularly from monday to friday, in the minute 15 of hours 8 to 50 (every two hours), data file for DockyDashboard gets updated by this task. The same that can be invoked by enabled link within the dashboard. - Maintenance restart
0 23 * * 6 shutdown -r now
This task will restart the BI server ever Saturday at 23h00, to prevent failures due to long runs of services. From time to time, random errors appears and are easily solve by restarting services, which is why this task was scheduled on weekly basis. - Check errors
0 23 * * 6 shutdown -r now
At 19:00 on every day-of-week from Monday through Friday, Kettle executes a performance check for cardcodes - CMYK cronjob
30 18 * * 2 /srv/Live/BilderColorSpace/run.sh
At 18:30 on Tuesday, Kettle extracts No RGB files and place those files into a kettle report - Invoices Gross Profit check
30 8 * * 2 /srv/bi/weekly.sh
At 08:30 on Tuesday, Kettle DB errors - Kettle check
0,10,20,30,40,50 * * * 0-5 curl http://cluster:cluster@localhost:9091/kettle/executeJob/?job=/srv/Live/checker/check.kjb&level=Minimal
At minute 0, 10, 20, 30, 40, and 50 on every day-of-week from Sunday through Friday, Kettle runs /srv/Live/checker/check.kjb - Mount process
0 5 * * 1-5 mount -0 remount -a
At 05:00 on every day-of-week from Monday through Friday, system remount an already-mounted filesystem - Pronto data
0 7 * * * /srv/bi/pronto.sh > /srv/bi/logs/pronto_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
At 07:00, is updated Kettle information file to generate pronto dashboard - Druckflachen data
30 7 * * * /srv/bi/snapshot.sh > /srv/bi/logs/snapshot_druckflachen_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
At 07:30, is updated Kettle information file to generate Druckflachen dashboard - GVV data
0 8,12,16 * * * /srv/bi/gvv.sh > /srv/bi/logs/gvv_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
At minute 0 past hour 8, 12, and 16, is updated Kettle information file to generate GVV dashboard - Lisa data
0 4 * * * /srv/bi/LISA-umsatz.sh > /srv/bi/logs/LISA-umsatz_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
At 04:00, is updated Kettle information file to generate Lisa dashboard - Shop data
0 8 * * 1-5 /srv/bi/einkaufen.sh > /srv/bi/logs/einkaufen_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
At 08:00 on every day-of-week from Monday through Friday, is updated Kettle information file to generate Shop dashboard - Monthly information report
0 7 1 * * /srv/bi/monthly.sh
At 07:00 on day-of-month 1, is updated Kettle information file to generate monthly dashboard - Remove temp files
0 0 * * 0 rm -rf /tmp/*and0 0 * * 1 rm -rf /srv/tmp/*
At 00:00 on Sunday, and at 00:00 on Monday system deletes /tmp/* and /srv/tmp/* files - Remove Imagemagic temp files
* * * * * find /tmp/magic-* -mmin +20 -exec rm {} \;
Every minute, system looks for Imagemagic temp files older than 20 minutes and delete them. - Pause
21 18 * * * /bin/sleep 1
At 18:21, system pauses the execution on the next shell command for 1 second.