With the following method, you can use your own existing Docker image or even create a new one to run a custom step. Theoretically, you can use any executable which can run on Linux.
We have described the process along with some examples in the following package: pipeline-example-embedded-linux-binary.zip
Below, you will find the general instructions on how to embed your own Docker image based on a Python example. The very same instructions can also be found in the top-level README.md file of the package above.
Furthermore, we have implemented examples using other executables. They are located in the other_examples folder of the archive above. Each of them has a specific README.md file located in its respective subfolder.
Extracting and compressing a file system from a Docker image
Build an example Docker image with a newer Python version from a Docker file (inside ./resources). The Docker image might install build-essential and must install fakechroot in order to decouple the file system of this Docker image from the host's file system in the pipeline processing container.
Use our provided Docker file for testing or try to use your own Docker container as FROM:
# Uses Python (based on ubuntu:22.04) or try to use your own docker image (based on libc)FROM docker.io/ubuntu:22.04# add build-essentials (useful for compiling Python modules during pip install)RUN apt-get update && apt-get install -y --no-install-recommends build-essential python3.11 python3-pip \ && apt-get clean && rm -rf /var/lib/apt/lists/* # Clean up to keep the image size as small as possible# add fakechroot to allow decoupling from host operating system without root permissionRUN apt-get update && apt-get install -y --no-install-recommends fakechroot \ && apt-get clean && rm -rf /var/lib/apt/lists/* # Clean up to keep the image size as small as possibleCMD ["./bin/bash"]Open a terminal (e.g. Bash or Cmd) for the example in the directory of this README.
Build a Docker image and give it a name (e.g. here it is python_image), with the following command:
docker build -t python_image -f ./resources/Dockerfile.
Create the Docker container without starting it, in order to build the file system of the image.
Linux:
CONTAINER_ID=$(docker create python_image) && echo $CONTAINER_ID
Windows Cmd:
docker create python_image > CONTAINER_IDset /P CONTAINER_ID=<CONTAINER_IDdel CONTAINER_IDecho %CONTAINER_ID%Export the file system of the newly created image into a compressed file (we use .xz for minimal file size).
Linux:
docker export $CONTAINER_ID | xz > ./resources/distro_flat.xz
Windows Cmd:
For compression, you may need the compress tool in your %PATH% environment.
docker export %CONTAINER_ID% | xz > .\resources\distro_flat.xz
If you have GitBash (Mingw64) installed, you could use the xz.exe from there with the following commands:
rem // Store path to git.exe in variable GIT_EXE_PATHwhere git > GIT_EXE_PATHset /P GIT_EXE_PATH=<GIT_EXE_PATHdel GIT_EXE_PATHrem // Store current directory and switch to git\mingw64\bin directorypushd %GIT_EXE_PATH%\..\..\mingw64\binrem // Save path of current directory %CD% (where xz.exe is located)set GIT_TOOLS_DIR=%CD%rem // Restore original directorypopddocker export %CONTAINER_ID% | "%GIT_TOOLS_DIR%"\xz.exe > .\resources\distro_flat.xzThis resulting distro_flat.xz can then be embedded into a custom step package for use within an Insights processing pipeline. To use the compressed file system of the desired distribution, some more information is needed and must be extracted from the image.
The following command will use the script create_environment.sh to generate the output for our constant.py. This script must be executed inside your image. With the following command, the Docker image is executed with a local mount to the scripts directory and will execute the script to collect the paths used in the example.
Linux:
docker run --rm -it -v "/$(pwd)/scripts:/scripts" python_image bash -c "chmod +x ./scripts/create_environment.sh && ./scripts/create_environment.sh"
Windows Cmd:
docker run --rm -it -v "%cd%/scripts:/scripts" python_image bash -c "chmod +x ./scripts/create_environment.sh && ./scripts/create_environment.sh
If it is not working with your linux distribution or you would rather do it on your own, the following commands might help you find out what is wrong or what is required to start an executable of your choice. The script executes the following commands in a row and builds the variables used in constant.py. You may also use this script to retrieve the necessary environment paths on your own, adapt them and use them in your own custom step if you do not use the example step.py, constant.py and embedded_linux.py.
Find the paths in the image for the ELF loader and for the fakechroot library. These paths must be used in your step.py to call the executable of your choice in the image inside the Insights processing pipeline container.
docker run --rm -it python_image bash -c "find / -name ld-*.so* -or -name libfakechroot.so | sed -u 's/^/\$\(pwd\)/'" # Expected output$(pwd)/usr/lib64/ld-linux-x86-64.so.2$(pwd)/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so# Expected output on MacOS$(pwd)/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 $(pwd)/usr/lib/aarch64-linux-gnu/fakechroot/libfakechroot.so $(pwd)/usr/lib/ld-linux-aarch64.so.1 These values are used to create environment variables for FAKECHROOT_ELFLOADER and for LD_PRELOAD. In our example these paths are stored in the file constant.py as variables named OWN_ELF_LOADER and OWN_PRELOAD, and are used by the helper methods in embedded_linux.py which are called in step.py.
Check if the following command, which lists all files in ld.so.conf in the image, is working. This command could be also executed locally on your own file system in your Docker container and should provide the libraries in your container.
docker run --rm -it python_image bash -c "ls /etc/ld.so.conf /etc/ld.so.conf.d/* | xargs cat | grep -v -E -e '^\s*(include|#|$)|fakechroot'" # Expected output/usr/local/lib/usr/local/lib/x86_64-linux-gnu/lib/x86_64-linux-gnu/usr/lib/x86_64-linux-gnu# Expected output on MacOS/usr/local/lib/aarch64-linux-gnu /lib/aarch64-linux-gnu /usr/lib/aarch64-linux-gnu /usr/local/lib Those values, combined by ':' colons and prepended with $(pwd), are used in the example in constant.py as variables named OWN_LIBRARY_PATH and are used for the environment variable LD_LIBRARY_PATH. This environment variable is necessary in your step.py and must be set before calling the executable of your choice.
We could build the LD_LIBRARY_PATH variable for the image (without the fakechroot library path).
docker run --rm -it python_image bash -c "ls /etc/ld.so.conf /etc/ld.so.conf.d/* | xargs cat | grep -v -E -e '^\s*(include|#|$)|fakechroot' | sed -u 's/^/\$\(pwd\)/' | tr '\n' ':' | rev | cut -c 2- | rev | xargs -n1 printf \"LD_LIBRARY_PATH=%s\"" # Expected outputLD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu# Expected output on MacOSLD_LIBRARY_PATH=$(pwd)/usr/local/lib/aarch64-linux-gnu:$(pwd)/lib/aarch64-linux-gnu:$(pwd)/usr/lib/aarch64-linux-gnu:$(pwd)/usr/local/libIn the example embedded_linux.py, we join the environment variables with space characters. Later on, we use them in the helper method run, which executes the command string in a Python subprocess.run.
env_variables = ' '.join([ 'FAKECHROOT_ELFLOADER=' + constant.OWN_ELF_LOADER, 'FAKECHROOT_BASE=$(pwd)', 'LD_PRELOAD=' + constant.OWN_PRELOAD, 'LD_LIBRARY_PATH=' + constant.OWN_LIBRARY_PATHS])The resulting command which is executed in the processing pipeline should be as follows:
# Environment variables for the loader that executes your executable FAKECHROOT_ELFLOADER=$(pwd)/usr/lib64/ld-linux-x86-64.so.2 FAKECHROOT_BASE=$(pwd)LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.soLD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu# The executable is the argument of the loader (ld-2.35.so). The loader will execute it.$(pwd)/usr/lib64/ld-linux-x86-64.so.2 <executable_of_your_choice>In Python, we use the module subprocess to execute the command in a separate process.
Now, we need to package our own file system and the environment variables, prepared as above, in a custom step. Therefore, we need to extract the file system inside the custom step, fix the symbolic links and finally call our own executable.
Using a compressed file system in a custom step inside the pipeline
First, you need to extract your own compressed file system inside your custom step runtime. Inside your custom step, you need to use something like the following code snippet to extract your compressed file system into a newly created directory (here named distro).
def unpack_xz(): os.mkdir('./distro') os.chdir('./distro') with tarfile.open('../resources/distro_flat.xz') as f: f.extractall('.') os.chdir('./..')Next, all the symbolic links inside the extracted file system have to be relocated so as to point to the outer absolute path of your image's file system, because the Linux Kernel will relocate them without the injected libc or fakechroot. Any symlink that points to an absolute file outside of your file system should be redirected to a new absolute path inside your extracted root directory, except for /proc and /dev. They must point to the outer file system.
You should consume the output to stdout of your subprocess, otherwise the pipeline processing may interfere with it.
def fix_symbolic_links(): cmd_fix_symlinks = ''' find $(pwd) -xdev -type l | while read linkname; do target=`readlink "$linkname"`; case "$target" in $(pwd)*) ;; # do nothing /*) ln -vsf "$(pwd)$target" "$linkname" ;; esac; done; rm -rf proc dev; ln -vsf /proc && ln -vsf /dev ''' print(subprocess.run(cmd_fix_symlinks, shell=True, stdout=subprocess.PIPE, cwd='./distro').stdout.decode('utf-8'), file=sys.stderr)Then, call the executable of your choice (provided by the image) with some special Unix techniques:
- Configure the included
fakechrootlibrary with its base directory that should be the root for your executable. - Configure the ELF loader that should be used inside the
fakechrootto read libraries in the images root directory. - With the environment variable
LD_PRELOAD, configure the library that wrapslibccalls. - With
LD_LIBRARY_PATH, configure the paths to the used libraries inside the extracted file system as absolute paths.
The next code snippet will show you an example call python --version with an absolute path from the outer file system $(pwd)/usr/local/bin/ ($(pwd): displays path name of working directory).
The executable itself is not called directly. It is used as a first argument of the loader $(pwd)/usr/lib64/ld-linux-x86-64.so.2 and is executed by the loader from the image. The loader must be configured to use the image's directory (./distro) as a root directory (/). Therefore, we prepend the environment variables (FAKECHROOT_ELFLOADER,FAKECHROOT_BASE,LD_PRELOAD,LD_LIBRARY_PATH), as they are necessary for the call. A Python subprocess could be configured to be executed in a current working directory (cwd="./distro"). Inside this base directory as a root of your extracted file system, the pwd will create absolute paths within the processing pipeline.
cmd_python_version = ' '.join([ 'FAKECHROOT_ELFLOADER=$(pwd)/usr/lib64/ld-linux-x86-64.so.2', 'FAKECHROOT_BASE=$(pwd)', 'LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.so', 'LD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu', '$(pwd)/usr/lib64/ld-linux-x86-64.so.2 $(pwd)/usr/bin/python3 --version' ])sp = subprocess.run(cmd_python_version, shell=True, stdout=subprocess.PIPE, cwd="./distro").stdout.decode('utf-8')Bundle code and compressed file system into a custom step zip file
Create a zip file for your custom Python step which contains all the source code and resources (i.e. the compressed file system) of the custom step.
These files and folders should be located at the root/top of the zip file (no parent folder):
| Mandatory |
| Mandatory |
| Mandatory |
| Mandatory if used as shown in the example |
| Mandatory if used as shown in the example |
| Optional |
See also Configuring a pipeline.
Debugging
In the invoke method of your step, you could check if the file system is extracted as expected.
# append current working directory just for informationdocument['metaData.debug']['current_working_dir'] = os.getcwd()# append current directory list of the extracted file systemdocument['metaData.debug']['own_file_system'] = os.listdir('./fs_other_distro')Alternatively, you may print debug information to stderr and check the output of your pipeline in the App Console of Bosch IoT Insights.
# print directory list of the extracted file systemprint("Extracted file system root: %s\n" % os.listdir(constant.OWN_FILE_SYSTEM_DIR), file=sys.stderr)Problems and Restrictions
Modify PATH environment
Sometimes you need to modify the PATH environment variable because other executables are expected to be available via $PATH. You can change the PATH environment variable with export PATH=$(pwd)/your-executable/bin:$PATH
Missing reference to /proc/self/exe
In other_examples/ghci you will find an example that explains problems caused by /proc/self/exe and how they could be fixed with a specific symlink or by using patchelf to run the command directly.
Invalid elf_header
If you get errors like libc.so is not an ELF file - it has the wrong magic bytes at the start. or libc.so: invalid ELF header you may also have a look at the example other_example/ghci
Restrictions
The mechanism will not work with programs that do not use libc, e.g. because they are built with static linking switched on. Those are pretty rare, since many distributions discourage the use of static linking.
For example, Go is a programming language that normally builds totally self-containing executables by packing all dependencies into.
Busybox – a famous all-in-one solution for small shell environments – is another example.
Finally, the Alpine Linux distribution is based on musl libc – another implementation of libc than the one used by fakechroot, i.e. (glibc). Such programs are not expected to work when LD_PRELOAD contains the libfakechroot.so, and so they will not export from Docker containers that use Alpine based images to reduce size.
Local testing of commands and your own executable
For a faster development cycle, it is also possible to test locally the executable of your choice and the commands necessary to run it. For this you need to run a Docker container on your developer machine, which simulates our processing pipeline container. The Insights container is Ubuntu-based (Jammy version), and so should be the test container. Then, you can use a terminal and execute the same steps directly, which are executed otherwise in your step.py.
The following Dockerfile (located in the directory local-testing) will create such a container, which is very similar to the one used in the production environment of Bosch IoT Insights.
FROM docker.io/eclipse-temurin:17-jdk-jammy # Make python3.9 version available (https://wiki.ubuntuusers.de/Python/manuelle_Installation/)WORKDIR /tmp ENV PYTHON_VERSION=3.9.15 # Make python3.9 version available (https://wiki.ubuntuusers.de/Python/manuelle_Installation/)RUN apt-get update \&& apt-get install -y --no-install-recommends \build-essential \libssl-dev \zlib1g-dev \libncurses5-dev \libncursesw5-dev \libreadline-dev \libsqlite3-dev \libgdbm-dev \libdb5.3-dev \libbz2-dev \libexpat1-dev \liblzma-dev \tk-dev \libffi-dev \uuid-dev \&& curl -k -L https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz -o Python-${PYTHON_VERSION}.tgz \&& tar -xf Python-${PYTHON_VERSION}.tgz \&& cd Python-${PYTHON_VERSION} && ./configure && make && make install && cd .. \&& rm -rf Python-${PYTHON_VERSION} Python-${PYTHON_VERSION}.tgz \&& apt-get clean && rm -rf /var/lib/apt/lists/* WORKDIR / RUN apt-get update && apt-get upgrade -y \&& apt-get install -y --no-install-recommends \curl \gcc \git \inotify-tools \net-tools \&& apt-get clean && rm -rf /var/lib/apt/lists/* RUN useradd -ms /bin/bash vcap \&& usermod -d /home/vcap vcap USER vcap:vcapWORKDIR /home/vcap ENV USER_DIR=/home/vcap CMD ["./bin/bash"]You can execute the following commands inside the directory local-testing.
At first, you need to build the Docker image for the simulated parent container and give it a name (here: pipeline_test_image).
docker build -t pipeline_test_image .Next you need to start the Docker container for this pipeline_test_image image and add a mount point to your local developer machine. The mount point will provide access to the distro_flat.xz, with all the content from step 1, from out of your running container. To summarize, the distro_flat.xz contains a file system that is extracted from a Linux distribution in which your executable is normally runnable. The file system is a flat file system and not a container instance, it contains only the physical bits and bytes (which are normally stored on a hard-drive), including all libraries and tools that are necessary to run this executable file.
Mounting the .xz archive into the container shortens the test cycle, as that big file is not required to be copied into the container.
Linux:
docker run --rm -it -v "/$(pwd)/../resources:/resources" pipeline_test_image bashWindows:
docker run --rm -it -v "%CD%/../resources:/resources" pipeline_test_image bashNow, you will be logged in into a running Docker container. Inside this container, which is very similar to our processing pipeline environment, you can test the commands that should be executed from your step.py. The commands that you need to execute are at least the following three:
Extract your
distro_flat.xz.
You need to extract thedistro_flat.xz, which is provided via a Docker mount point. Thedistro_flat.xzis located on your developer machine, inside the.resourcesdirectory, and will be mapped inside the Docker container under the absolute path/resources. The following command will create a directory and extract thedistro_flat.xzfile into this directory and finally print the content of this directory.mkdirdistro &&cddistro &&tar-xf/resources/distro_flat.xz -C ~/distro&&ls-laFix all symlinks inside the extracted file system. To bring the extracted file system to life, we need to fix all internal symlinks as we do it in the
step.py. Therefore, you could execute the following command:find$(pwd) -xdev -typel |whilereadlinkname;dotarget=`readlink"$linkname"`;case"$target"in$(pwd)*) ;;# do nothing/*)ln-vsf"$(pwd)$target""$linkname";;esac;done;rm-rf proc dev;ln-vsf/proc&&ln-vsf/devAt last, you can try to call your executable using the advanced Linux techniques described in this article. You must set the environment variables for the
elf_loader, thefakechrootdirectory, and the paths the preloaded library directories. If there are any errors or missing libraries or something like that, the errors will be printed directly to the console of the running Docker container. This will make the feedback much faster, and it will be easier to analyze why something is missing. If you can successfully run the executable of your choice using this approach, you still need to translate all the commands that you executed in this container into Python code and repeat them in thestep.pyof your custom step. Thehistoryunix command displays all the commands that have been executed in your running container.
The following command will run the newer Python version and should show how the commands must look like:FAKECHROOT_ELFLOADER=$(pwd)/usr/lib64/ld-linux-x86-64.so.2FAKECHROOT_BASE=$(pwd)LD_PRELOAD=$(pwd)/usr/lib/x86_64-linux-gnu/fakechroot/libfakechroot.soLD_LIBRARY_PATH=$(pwd)/usr/local/lib:$(pwd)/usr/local/lib/x86_64-linux-gnu:$(pwd)/lib/x86_64-linux-gnu:$(pwd)/usr/lib/x86_64-linux-gnu$(pwd)/usr/lib64/ld-linux-x86-64.so.2 $(pwd)/usr/bin/python3--version
In addition, if you rather want to test your Python step.py code, it is also possible to mount the whole custom-step.zip file that you would normally upload to Bosch IoT Insights. Inside the running container you have to unpack this zip file. Inside the unzipped directory you should find the structure with your resources/distro_flat.xz and inside src you should find your Python code. With this setup, you can test your Python code and that the unzipping of the distro-flat.xz works as expected.