Bosch IoT Insights

With the following method, you can use your own existing Docker image or even create a new one to run a custom step (see also Pipelines: Configuring the custom step). Theoretically, you can use any executable which can run on Linux.

We have described the process along with some examples in the following package: pipeline-example-embedded-linux-binary.zip

Below, you will find the general instructions on how to embed your own Docker image based on a Python example. The very same instructions can also be found in the top-level README.md file of the package above.

Furthermore, we have implemented examples using other executables. They are located in the other_examples folder of the archive above. Each of them has a specific README.md file located in its respective subfolder.

Extracting and compressing a file system from a Docker image

Build an example Docker image with a newer Python version from a Docker file (inside ./resources). The Docker image might install build-essential and must install fakechroot in order to decouple the file system of this Docker image from the host's file system in the pipeline processing container.

Use our provided Docker file for testing or try to use your own Docker container as FROM:

# Uses Python (based on ubuntu:22.04) or try to use your own docker image (based on libc)
FROM docker.io/ubuntu:22.04
 
# add build-essentials (useful for compiling Python modules during pip install)
RUN apt-get update && apt-get install -y --no-install-recommends build-essential python3.11 python3-pip \
                   && apt-get clean && rm -rf /var/lib/apt/lists/* # Clean up to keep the image size as small as possible
 
# add fakechroot to allow decoupling from host operating system without root permission
RUN apt-get update && apt-get install -y --no-install-recommends fakechroot \
                   && apt-get clean && rm -rf /var/lib/apt/lists/* # Clean up to keep the image size as small as possible
 
CMD ["./bin/bash"]

Open a terminal (e.g. Bash or Cmd) for the example in the directory of this README.

Build a Docker image and give it a name (e.g. here it is python_image), with the following command:

docker build -t python_image -f ./resources/Dockerfile.

Create the Docker container without starting it, in order to build the file system of the image.

Linux:

CONTAINER_ID=$(docker create python_image) && echo $CONTAINER_ID

Windows Cmd:

docker create python_image > CONTAINER_ID
set /P CONTAINER_ID=<CONTAINER_ID
del CONTAINER_ID
echo %CONTAINER_ID%

Export the file system of the newly created image into a compressed file (we use .xz for minimal file size).

Linux:

docker export $CONTAINER_ID | xz > ./resources/distro_flat.xz

Windows Cmd:

For compression, you may need the compress tool in your %PATH% environment.

docker export %CONTAINER_ID% | xz > .\resources\distro_flat.xz

If you have GitBash (Mingw64) installed, you could use the xz.exe from there with the following commands:

rem // Store path to git.exe in variable GIT_EXE_PATH
where git > GIT_EXE_PATH
set /P GIT_EXE_PATH=<GIT_EXE_PATH
del GIT_EXE_PATH
rem // Store current directory and switch to git\mingw64\bin directory
pushd %GIT_EXE_PATH%\..\..\mingw64\bin
rem // Save path of current directory %CD% (where xz.exe is located)
set GIT_TOOLS_DIR=%CD%
rem // Restore original directory
popd
docker export %CONTAINER_ID% | "%GIT_TOOLS_DIR%"\xz.exe > .\resources\distro_flat.xz

This resulting distro_flat.xz can then be embedded into a custom step package for use within an Insights processing pipeline. To use the compressed file system of the desired distribution, some more information is needed and must be extracted from the image.

The following command will use the script create_environment.sh to generate the output for our constant.py. This script must be executed inside your image. With the following command, the Docker image is executed with a local mount to the scripts directory and will execute the script to collect the paths used in the example.

Linux:

docker run --rm -it -v "/$(pwd)/scripts:/scripts" python_image bash -c "chmod +x ./scripts/create_environment.sh && ./scripts/create_environment.sh"

Windows Cmd:

docker run --rm -it -v "%cd%/scripts:/scripts" python_image bash -c "chmod +x ./scripts/create_environment.sh && ./scripts/create_environment.sh

If it is not working with your linux distribution or you would rather do it on your own, the following commands might help you find out what is wrong or what is required to start an executable of your choice. The script executes the following commands in a row and builds the variables used in constant.py. You may also use this script to retrieve the necessary environment paths on your own, adapt them and use them in your own custom step if you do not use the example step.py, constant.py and embedded_linux.py.

Find the paths in the image for the ELF loader and for the fakechroot library. These paths must be used in your step.py to call the executable of your choice in the image inside the Insights processing pipeline container.

docker run --rm -it python_image bash -c "find / -name ld-*.so* -or -name libfakechroot.so | sed -u 's/^/\$\(pwd\)/'"
 
 
# Expected output
$(pwd)/usr/lib64/ld-linux-x86-64.so.2
$(pwd)/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so
 
# Expected output on MacOS
$(pwd)/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 
$(pwd)/usr/lib/aarch64-linux-gnu/fakechroot/libfakechroot.so 
$(pwd)/usr/lib/ld-linux-aarch64.so.1 

These values are used to create environment variables for FAKECHROOT_ELFLOADER and for LD_PRELOAD. In our example these paths are stored in the file constant.py as variables named OWN_ELF_LOADER and OWN_PRELOAD, and are used by the helper methods in embedded_linux.py which are called in step.py.

Check if the following command, which lists all files in ld.so.conf in the image, is working. This command could be also executed locally on your own file system in your Docker container and should provide the libraries in your container.

docker run --rm -it python_image bash -c "ls /etc/ld.so.conf /etc/ld.so.conf.d/* | xargs cat | grep -v -E -e '^\s*(include|#|$)|fakechroot'"
 
 
# Expected output
/usr/local/lib
/usr/local/lib/x86_64-linux-gnu
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu
 
# Expected output on MacOS
/usr/local/lib/aarch64-linux-gnu 
/lib/aarch64-linux-gnu 
/usr/lib/aarch64-linux-gnu 
/usr/local/lib 

Those values, combined by ':' colons and prepended with $(pwd), are used in the example in constant.py as variables named OWN_LIBRARY_PATH and are used for the environment variable LD_LIBRARY_PATH. This environment variable is necessary in your step.py and must be set before calling the executable of your choice.

We could build the LD_LIBRARY_PATH variable for the image (without the fakechroot library path).

docker run --rm -it python_image bash -c "ls /etc/ld.so.conf /etc/ld.so.conf.d/* | xargs cat | grep -v -E -e '^\s*(include|#|$)|fakechroot' | sed -u 's/^/\$\(pwd\)/' | tr '\n' ':' | rev | cut -c 2- | rev | xargs -n1 printf \"LD_LIBRARY_PATH=%s\""
 
 
# Expected output
LD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu
 
# Expected output on MacOS
LD_LIBRARY_PATH=$(pwd)/usr/local/lib/aarch64-linux-gnu:$(pwd)/lib/aarch64-linux-gnu:$(pwd)/usr/lib/aarch64-linux-gnu:$(pwd)/usr/local/lib

In the example embedded_linux.py, we join the environment variables with space characters. Later on, we use them in the helper method run, which executes the command string in a Python subprocess.run.

env_variables = ' '.join([
    'FAKECHROOT_ELFLOADER=' + constant.OWN_ELF_LOADER,
    'FAKECHROOT_BASE=$(pwd)',
    'LD_PRELOAD=' + constant.OWN_PRELOAD,
    'LD_LIBRARY_PATH=' + constant.OWN_LIBRARY_PATHS
])

The resulting command which is executed in the processing pipeline should be as follows:

# Environment variables for the loader that executes your executable 
FAKECHROOT_ELFLOADER=$(pwd)/usr/lib64/ld-linux-x86-64.so.2 
FAKECHROOT_BASE=$(pwd)
LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so
LD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu
# The executable is the argument of the loader (ld-2.35.so). The loader will execute it.
$(pwd)/usr/lib64/ld-linux-x86-64.so.2 <executable_of_your_choice>

In Python, we use the module subprocess to execute the command in a separate process.

Now, we need to package our own file system and the environment variables, prepared as above, in a custom step. Therefore, we need to extract the file system inside the custom step, fix the symbolic links and finally call our own executable.

Using a compressed file system in a custom step inside the pipeline

First, you need to extract your own compressed file system inside your custom step runtime. Inside your custom step, you need to use something like the following code snippet to extract your compressed file system into a newly created directory (here named distro).

def unpack_xz():
    os.mkdir('./distro')
    os.chdir('./distro')
    with tarfile.open('../resources/distro_flat.xz') as f:
        f.extractall('.')
    os.chdir('./..')

Next, all the symbolic links inside the extracted file system have to be relocated so as to point to the outer absolute path of your image's file system, because the Linux Kernel will relocate them without the injected libc or fakechroot. Any symlink that points to an absolute file outside of your file system should be redirected to a new absolute path inside your extracted root directory, except for /proc and /dev. They must point to the outer file system.
You should consume the output to stdout of your subprocess, otherwise the pipeline processing may interfere with it.

def fix_symbolic_links():
    cmd_fix_symlinks = '''
        find $(pwd) -xdev -type l | while read linkname;
        do
            target=`readlink "$linkname"`;
            case "$target" in
                $(pwd)*) ;; # do nothing 
                /*) ln -vsf "$(pwd)$target" "$linkname" ;;
            esac;
        done;
        rm -rf proc dev; ln -vsf /proc && ln -vsf /dev
        '''
    print(subprocess.run(cmd_fix_symlinks, shell=True, stdout=subprocess.PIPE, cwd='./distro').stdout.decode('utf-8'), file=sys.stderr)

Then, call the executable of your choice (provided by the image) with some special Unix techniques:

Configure the included fakechroot library with its base directory that should be the root for your executable.
Configure the ELF loader that should be used inside the fakechroot to read libraries in the images root directory.
With the environment variable LD_PRELOAD, configure the library that wraps libc calls.
With LD_LIBRARY_PATH, configure the paths to the used libraries inside the extracted file system as absolute paths.

The next code snippet will show you an example call python --version with an absolute path from the outer file system $(pwd)/usr/local/bin/ ($(pwd): displays path name of working directory).

The executable itself is not called directly. It is used as a first argument of the loader $(pwd)/usr/lib64/ld-linux-x86-64.so.2 and is executed by the loader from the image. The loader must be configured to use the image's directory (./distro) as a root directory (/). Therefore, we prepend the environment variables (FAKECHROOT_ELFLOADER,FAKECHROOT_BASE,LD_PRELOAD,LD_LIBRARY_PATH), as they are necessary for the call. A Python subprocess could be configured to be executed in a current working directory (cwd="./distro"). Inside this base directory as a root of your extracted file system, the pwd will create absolute paths within the processing pipeline.

cmd_python_version = ' '.join([
    'FAKECHROOT_ELFLOADER=$(pwd)/usr/lib64/ld-linux-x86-64.so.2',
    'FAKECHROOT_BASE=$(pwd)',
    'LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so',
    'LD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu',
    '$(pwd)/usr/lib64/ld-linux-x86-64.so.2 $(pwd)/usr/bin/python3 --version' ])
sp = subprocess.run(cmd_python_version, shell=True, stdout=subprocess.PIPE, cwd="./distro").stdout.decode('utf-8')

Bundle code and compressed file system into a custom step zip file

Create a zip file for your custom Python step which contains all the source code and resources (i.e. the compressed file system) of the custom step.

These files and folders should be located at the root/top of the zip file (no parent folder):

`executable-manifest.yaml`	Mandatory
`resources/distro_flat.xz`	Mandatory
`src/step.py`	Mandatory
`src/insights_protocol.py`	Mandatory if used as shown in the example
`src/constant.py`	Mandatory if used as shown in the example
`requirements.txt`	Optional

Debugging

In the invoke method of your step, you could check if the file system is extracted as expected.

# append current working directory just for information
document['metaData.debug']['current_working_dir'] = os.getcwd()
# append current directory list of the extracted file system
document['metaData.debug']['own_file_system'] = os.listdir('./fs_other_distro')

Alternatively, you may print debug information to stderr and check the output of your pipeline in the App Console of Bosch IoT Insights.

# print directory list of the extracted file system

print("Extracted file system root: %s\n" % os.listdir(constant.OWN_FILE_SYSTEM_DIR), file=sys.stderr)

Problems and Restrictions

Modify PATH environment

Sometimes you need to modify the PATH environment variable because other executables are expected to be available via $PATH. You can change the PATH environment variable with export PATH=$(pwd)/your-executable/bin:$PATH

Missing reference to /proc/self/exe

In other_examples/ghci you will find an example that explains problems caused by /proc/self/exe and how they could be fixed with a specific symlink or by using patchelf to run the command directly.

Invalid elf_header

If you get errors like libc.so is not an ELF file - it has the wrong magic bytes at the start. or libc.so: invalid ELF header you may also have a look at the example other_example/ghci

Restrictions

The mechanism will not work with programs that do not use libc, e.g. because they are built with static linking switched on. Those are pretty rare, since many distributions discourage the use of static linking.

For example, Go is a programming language that normally builds totally self-containing executables by packing all dependencies into a package.

Busybox – a famous all-in-one solution for small shell environments – is another example.

Finally, the Alpine Linux distribution is based on musl libc – another implementation of libc than the one used by fakechroot, i.e. (glibc). Such programs are not expected to work when LD_PRELOAD contains the libfakechroot.so, and so they will not export from Docker containers that use Alpine based images to reduce size.

Local testing of commands and your own executable

For a faster development cycle, it is also possible to test locally the executable of your choice and the commands necessary to run it. For this you need to run a Docker container on your developer machine, which simulates our processing pipeline container. The Insights container is Ubuntu-based (Jammy version), and so should be the test container. Then, you can use a terminal and execute the same steps directly, which are executed otherwise in your step.py.

The following Dockerfile (located in the directory local-testing) will create such a container, which is very similar to the one used in the production environment of Bosch IoT Insights.

FROM docker.io/eclipse-temurin:17-jdk-jammy
 
# Make python3.9 version available (https://wiki.ubuntuusers.de/Python/manuelle_Installation/)
WORKDIR /tmp
 
ENV PYTHON_VERSION=3.9.15
 
# Make python3.9 version available (https://wiki.ubuntuusers.de/Python/manuelle_Installation/)
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
build-essential \
libssl-dev \
zlib1g-dev \
libncurses5-dev \
libncursesw5-dev \
libreadline-dev \
libsqlite3-dev \
libgdbm-dev \
libdb5.3-dev \
libbz2-dev \
libexpat1-dev \
liblzma-dev \
tk-dev \
libffi-dev \
uuid-dev \
&& curl -k -L https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz -o Python-${PYTHON_VERSION}.tgz \
&& tar -xf Python-${PYTHON_VERSION}.tgz \
&& cd Python-${PYTHON_VERSION} && ./configure && make && make install && cd .. \
&& rm -rf Python-${PYTHON_VERSION} Python-${PYTHON_VERSION}.tgz \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
 
WORKDIR /
 
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
curl \
gcc \
git \
inotify-tools \
net-tools \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
 
RUN useradd -ms /bin/bash vcap \
&& usermod -d /home/vcap vcap
 
USER vcap:vcap
WORKDIR /home/vcap
 
ENV USER_DIR=/home/vcap
 
CMD ["./bin/bash"]

You can execute the following commands inside the directory local-testing.

At first, you need to build the Docker image for the simulated parent container and give it a name (here: pipeline_test_image).

docker build -t pipeline_test_image .

Next you need to start the Docker container for this pipeline_test_image image and add a mount point to your local developer machine. The mount point will provide access to the distro_flat.xz, with all the content from step 1, from out of your running container. To summarize, the distro_flat.xz contains a file system that is extracted from a Linux distribution in which your executable is normally runnable. The file system is a flat file system and not a container instance, it contains only the physical bits and bytes (which are normally stored on a hard-drive), including all libraries and tools that are necessary to run this executable file.

Mounting the .xz archive into the container shortens the test cycle, as that big file is not required to be copied into the container.

Linux:

docker run --rm -it -v "/$(pwd)/../resources:/resources" pipeline_test_image bash

Windows:

docker run --rm -it -v "%CD%/../resources:/resources" pipeline_test_image bash

Now, you will be logged in into a running Docker container. Inside this container, which is very similar to our processing pipeline environment, you can test the commands that should be executed from your step.py. The commands that you need to execute are at least the following three:

Extract your distro_flat.xz.
You need to extract the distro_flat.xz, which is provided via a Docker mount point. The distro_flat.xz is located on your developer machine, inside the .resources directory, and will be mapped inside the Docker container under the absolute path /resources. The following command will create a directory and extract the distro_flat.xz file into this directory and finally print the content of this directory.

mkdir distro && cd distro && tar -xf /resources/distro_flat.xz -C ~/distro && ls -la
Fix all symlinks inside the extracted file system. To bring the extracted file system to life, we need to fix all internal symlinks as we do it in the step.py. Therefore, you could execute the following command:

find $(pwd) -xdev -type l | while read linkname;

do

    target=`readlink "$linkname"`;

    case "$target" in

        $(pwd)*) ;; # do nothing

        /*) ln -vsf "$(pwd)$target" "$linkname" ;;

    esac;

done;

rm -rf proc dev; ln -vsf /proc && ln -vsf /dev
At last, you can try to call your executable using the advanced Linux techniques described in this article. You must set the environment variables for the elf_loader, the fakechroot directory, and the paths the preloaded library directories. If there are any errors or missing libraries or something like that, the errors will be printed directly to the console of the running Docker container. This will make the feedback much faster, and it will be easier to analyze why something is missing. If you can successfully run the executable of your choice using this approach, you still need to translate all the commands that you executed in this container into Python code and repeat them in the step.py of your custom step. The history unix command displays all the commands that have been executed in your running container.
The following command will run the newer Python version and should show how the commands must look like:

FAKECHROOT_ELFLOADER=$(pwd)/usr/lib64/ld-linux-x86-64.so.2

FAKECHROOT_BASE=$(pwd)

LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so

LD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu

$(pwd)/usr/lib64/ld-linux-x86-64.so.2 $(pwd)/usr/bin/python3 --version

In addition, if you rather want to test your Python step.py code, it is also possible to mount the whole custom-step.zip file that you would normally upload to Bosch IoT Insights. Inside the running container you have to unpack this zip file. Inside the unzipped directory you should find the structure with your resources/distro_flat.xz and inside src you should find your Python code. With this setup, you can test your Python code and that the unzipping of the distro-flat.xz works as expected.