# Introduction

This documentation is intended to describe general configuration and common usage of BI Server, in order to prevent malfunctioning and to recover from failures.

# Access info

# Installed hardware and software

  • Hardware
    RAM: 16 Gb
    Core(s): 2 GenuineIntel
  • Operating System:
    $ lsb_release -a Distributor ID: Debian Description: Debian GNU/Linux 9.1 (n/a) Release: 9.1
  • JAVA:
    $ java -version java version "1.7.0_151" OpenJDK Runtime Environment (IcedTea 2.6.11) (7u151-2.6.11-1~deb8u1) OpenJDK 64-Bit Server VM (build 24.151-b01, mixed mode)
  • Apache2:
    $ apachectl -v Server version: Apache/2.4.10 (Debian) Server built: Sep 15 2016 20:44:43
  • PHP:
    $ php -v PHP 5.6.30-0+deb8u1 (cli) (built: Feb 8 2017 08:50:21) Copyright (c) 1997-2016 The PHP Group Zend Engine v2.6.0, Copyright (c) 1998-2016 Zend Technologies with Zend OPcache v7.0.6-dev, Copyright (c) 1999-2016, by Zend Technologies
  • ImageMagick:
    $ identify -version Version: ImageMagick 6.8.9-9 Q16 x86_64 2016-11-26 http://www.imagemagick.org (opens new window) Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC Features: DPC Modules OpenMP Delegates: bzlib cairo djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png rsvg tiff wmf x xml zlib
  • PostgreSQL:
    PostgreSQL 9.4.13 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit This was the free database of choice to store data warehouse, data marts and similar bases. Admin user is “postgres”, so you have become postgres after becoming root, since this user has no password associated.
  • Pentaho BI-Server:
    There is one instance of Pentaho bi-server deployed on:
    /srv/bi/biserver-ce/
    It uses port 8009 for AJP connections via Apache2 It is configured to use Postgres as system database.
  • Pentaho Data-Integration:
    There are various instances of PDI (eg: Kettle).
    Data Warehouse is updated on daily basis using PDI located under:
    /srv/bi/data-integration/
    You’ll find two more instances that runs processes on demand, and are waiting for requests to process.
    This are located under:
    /srv/Test/data-integration/
    /srv/Live/data-integration/
    The one under Test environment is configured to use Test databases connections. While Live connections are defined in the other instance.
    Both instances are also running as Carte service, a way of listen for execution requests, saving time devoted to launch a new Java Virtual Machine (JVM) per request.
    Carte instance for testing purposes runs in 9090 port and can be accessed from the outside world using this URL:
    http://cluster:cluster@192.168.100.100:9090/kettle/status/
    Production instance can only be accessed from localhost using:
    http://cluster:cluster@localhost:9191/kettle/status
    This is why a PHP service was exposed to invoke this services, for both live and test purposes.

# ETL Services

# Docky

There are three services designed for docky specific tasks:

  • processCardCode
    Source code is located at:
    [Live] /var/www/html/live/docky/processCardCode.php
    [Test] /var/www/html/test/docky/processCardCode.php
    Can be invoked in the form of http request:
    [Live] http://bisrv/live/docky/processCardCode.php?cardcode=B1234&tool=caller
    [Test] http://bisrv/test/docky/processCardCode.php?cardcode=B1234&tool=caller
    It expects one parameter indicating which customer to process. The tool parameter is optional and intended to easy identification of callers in the access log (for instance one tool sending lots of requests per second).
  • processFile
    Source code is located at:
    [Live] /var/www/html/live/docky/processFile.php
    [Test] /var/www/html/test/docky/processFile.php
    Can be invoked in the form of http request:
    [Live] http://bisrv/live/docky/processFile.php?path=//gifham002/Logos/Live/...
    [Test] http://bisrv/test/docky/processFile.php?path=//gifham002/Logos/Test/...
    Unlike previous services designed to synchronize all files from one customer, this will only add the file at hand, saving time and efforts in the process.
    It expects the “path” parameter, that must point to a physical file under: //gifham002/Logos/, and asynchronously invoke the corresponding Carte service:
    [Live] http://cluster:cluster@localhost:9191/kettle/executeJob/? job=/srv/Live/UpDocky/processFile.kjb&level=Minimal&filename=/local/path/to/file
    [Test] http://cluster:cluster@192.168.100.100:9090/kettle/executeJob/? job=/srv/Test/UpDocky/processFile.kjb&level=Minimal&filename=/local/path/to/file
    This process assumes that the file doesn’t exist and tries to save time avoiding double checks. So if it is invoked several times with the same argument, it will duplicate info in Docky database with multiple records pointing to the same physical file.
  • # ImageMagick

There are other services designed for image manipulation, given that ImageMagick is already installed and have access to remote folders.

  • dockyPreview
    Source code is located at:
    [Live] /var/www/html/live/imageMagik/dockyPreview.php
    [Test] /var/www/html/test/imageMagik/dockyPreview.php
    It is intended to generate a Large preview for the document at hand, and in the process produces a small preview.

# Pentaho Server

The Pentaho BI Server (Community Edition), found in /srv/bi/biserver-ce/ will launch automatically with the system. You can manually stop/start the server with commands:

# /etc/init.d/pentaho stop
# /etc/init.d/pentaho start

Tomcat is the J2EE container that runs pentaho bi-server, and you can read console errors using:

$ tail -f /srv/bi/biserver-ce/tomcat/logs/catalina.out

There is also an audit log that can be used to analyze usage of this service in:

$ tail -f /srv/bi/biserver-ce/pentaho-solutions/system/logs/audit/PentahoAuditLog.log

A specific and detailed Kettle Pentaho's documentation can be found here Kettle Pentaho Documentation

# Scheduled Tasks (Crontab)

There are some tasks that runs regularly and are managed by system crontab. You can see current list using:

crontab -e
  1. Data warehouse daily synchronization
    30 01 * * * /srv/bi/update.sh > /srv/bi/logs/update_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    Every day around 01h30, the /srv/bi/update.sh is launched to update local data warehouse, and regenerate multiple data marts that feed reports and dashboards. The entire process is currently lasting until 10h00 in worst cases.
  2. Standskizzen
    05 20 * * * /srv/Live/Standskizzen/run.sh
    Every day around 20h05, this process looks for new documents in Standskizzen that are not in JPG format to produce a preview. This process sends an email if there is any failure
  3. UPS Dashboard update
    00 07 * * * /srv/bi/ups.sh > /srv/bi/logs/ups_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    Every day around 07h00, this process rewrites the data file used to feed UPS dashboard.
  4. DockyDashboard Update
    15 08,10,12,14,16,18,20 * * 1-5 /srv/Live/DockyDashboard/run.sh
    Regularly from monday to friday, in the minute 15 of hours 8 to 50 (every two hours), data file for DockyDashboard gets updated by this task. The same that can be invoked by enabled link within the dashboard.
  5. Maintenance restart
    0 23 * * 6 shutdown -r now
    This task will restart the BI server ever Saturday at 23h00, to prevent failures due to long runs of services. From time to time, random errors appears and are easily solve by restarting services, which is why this task was scheduled on weekly basis.
  6. Check errors 0 23 * * 6 shutdown -r now
    At 19:00 on every day-of-week from Monday through Friday, Kettle executes a performance check for cardcodes
  7. CMYK cronjob 30 18 * * 2 /srv/Live/BilderColorSpace/run.sh
    At 18:30 on Tuesday, Kettle extracts No RGB files and place those files into a kettle report
  8. Invoices Gross Profit check 30 8 * * 2 /srv/bi/weekly.sh
    At 08:30 on Tuesday, Kettle DB errors
  9. Kettle check 0,10,20,30,40,50 * * * 0-5 curl http://cluster:cluster@localhost:9091/kettle/executeJob/?job=/srv/Live/checker/check.kjb&level=Minimal
    At minute 0, 10, 20, 30, 40, and 50 on every day-of-week from Sunday through Friday, Kettle runs /srv/Live/checker/check.kjb
  10. Mount process 0 5 * * 1-5 mount -0 remount -a
    At 05:00 on every day-of-week from Monday through Friday, system remount an already-mounted filesystem
  11. Pronto data 0 7 * * * /srv/bi/pronto.sh > /srv/bi/logs/pronto_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    At 07:00, is updated Kettle information file to generate pronto dashboard
  12. Druckflachen data 30 7 * * * /srv/bi/snapshot.sh > /srv/bi/logs/snapshot_druckflachen_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    At 07:30, is updated Kettle information file to generate Druckflachen dashboard
  13. GVV data 0 8,12,16 * * * /srv/bi/gvv.sh > /srv/bi/logs/gvv_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    At minute 0 past hour 8, 12, and 16, is updated Kettle information file to generate GVV dashboard
  14. Lisa data 0 4 * * * /srv/bi/LISA-umsatz.sh > /srv/bi/logs/LISA-umsatz_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    At 04:00, is updated Kettle information file to generate Lisa dashboard
  15. Shop data 0 8 * * 1-5 /srv/bi/einkaufen.sh > /srv/bi/logs/einkaufen_"date +\%y\%m\%d_\%H\%M\%S".log 2>&1
    At 08:00 on every day-of-week from Monday through Friday, is updated Kettle information file to generate Shop dashboard
  16. Monthly information report 0 7 1 * * /srv/bi/monthly.sh
    At 07:00 on day-of-month 1, is updated Kettle information file to generate monthly dashboard
  17. Remove temp files 0 0 * * 0 rm -rf /tmp/* and 0 0 * * 1 rm -rf /srv/tmp/*
    At 00:00 on Sunday, and at 00:00 on Monday system deletes /tmp/* and /srv/tmp/* files
  18. Remove Imagemagic temp files * * * * * find /tmp/magic-* -mmin +20 -exec rm {} \;
    Every minute, system looks for Imagemagic temp files older than 20 minutes and delete them.
  19. Pause 21 18 * * * /bin/sleep 1
    At 18:21, system pauses the execution on the next shell command for 1 second.
Page Info: Created by GitHub on Jun 9, 2023 (last updated a minute ago by GitHub)