JExtract, the magical tool for interoperability between languages

Page Contents

In today’s computing world, business applications run on varied software stacks, where the delegation of responsibility is done on the basis of what a language/library does best.

E.g.

Python is widely used for data science and machine learning;
JavaScript is the preferred language for front-end web development and interactive web applications;
Java is the best for enterprise applications, Android app development and large-scale systems ;
C++ is good for high-performance applications, game development and embedded systems;
Go is used for infrastructure automation and cloud computing.

As software stacks continue to evolve and coordinate, there is an ongoing need for better and faster tools to orchestrate the coordination between different languages and software libraries.

In this post we will discuss one such tool, named Jextract, Jextract provides the necessary glue and connections to make different languages work together.

Importance of Folders in the Jextract Directives

For any language, whether it is Python, C, C++ or R., there are some common folders that are crucial for organizing and managing code, dependencies, system settings etc.

1. The LIB directory contains collections of pre-compiled code, called shared libraries, which multiple programs can use simultaneously.

Typically a LIB folder can contain prebuilt language structures, like collections, cursors, xml and json, and routines like calendar, mailbox, tokenizer, fractions etc.

2. The INCLUDE directory contains header files that contain declarations (prototypes) of functions, classes, and other elements that are used in a program’s code.

3. The ETC directory can contain configuration files that control the network, authentication, database and package settings.

4. Based on the operating system, some shared libraries are kept in compiled form. On Linux you can find .SO files in /etc/ld.so.cache. On Windows you can find DLL files in the DLLs folder.

Using the above information, Jextract generates Java bindings from native library headers, internally using the Foreign Function and Memory API (FFM).

What Does Jextract Do?

Seasoned Java programmers will remember that in Java RMI, there were Stub and Skeleton classes. Stubs were client-side proxies that represented the remote object whose methods were to be called. Skeletons were server-side proxies that handled the received calls from Stubs. A somewhat similar approach is used here also.

Jextract parses the native libraries, including their structure and header files, and then generates the necessary bindings (Java classes) which act as a bridge between your Java code and those native libraries. Using the classes generated by Jextract, to and fro function calling, and passing of parameters and results, becomes easily possible.

Thanks to the output of Jextract, we can directly use a Java model of the native libraries we want to use in our code. Jextract internally uses the Foreign Function and Memory API, with the Foreign Linker API.

Jextract-generated classes handle native symbol lookup, calculate function descriptors from Python or C declaration methods, and present simple Java static methods to invoke the underlying Python or C functions.

Typically, we provide Jextract with:

(i) the paths of the native libraries to use(include). These must be findable via the search path in your operating system; e.g. LD_LIBRARY_PATH (Linux), DYLD_LIBRARY_PATH (Mac), or PATH (Windows).

(ii) instructions on which native libraries to load,

(iii) the main header file of the native library we want to generate bindings for.

If we want to use multiple libraries, say Python as well as C, we need to run Jextract twice, as the bindings will be different for different languages.

Some Switches Used With JExtract

Jextract switches include:

-l (lowercase L) to specify the shared library that should be loaded by the generated header class. e.g. opengl32 to load an open graphics library, or the absolute path of a Python shared library like Python.so or Python.dll;
-I (uppercase I), to specify the header file search directories.This switch is typically applied multiple times to include multiple locations where header files are stored, e.g. Python or C header files with the underlying operating system’s header files. All these directories are included in search paths in the order in which they appear in the Jextract command;
–output <path> Folder to place the generated files;
-t, –target-package <package> Name of the package to which the generated classes will belong.

Once the bindings are generated, the inter-language and inter-library calls are executed in a well-orchestrated manner. The classes generated by Jextract will have a single entry Point. This class is generally named after the main header file that is passed to Jextract, e.g. Python_h OR C_h.

Sample Jextract command:

jextract -t my.opengl --output /home/myuser/gensrc \
-l /usr/lib/GL \
-l /usr/lib64/GLU \
-l /usr/lib64/glut \
/usr/include/GL/glut.h

You can explicitly set a name of your choice via the “–header-class-name” option. This entry-point class also contains methods to access functions, global variables, primitive typedefs, macros, enums and layouts for builtin language types.

In other words, for each common builtin datatype of the target (native library) language, a set of appropriate memory layouts is generated, so that the programmer (you) does not have to worry about data type mismatches or interpretational errors.

Apart from the main header file, Jextract generates additional classes for declarations for structs, unions, function pointer types, and typedefs of struct or union types.

For example, if you have a global variable declared in a header file, int myNum, Jextract will allocate memory for this variable, myNum$layout will be the FFM memory layout of this global variable and myNum$segment will be the address of this global variable.

Similarly, for each function pointer type found in the header files, Jextract generates a separate class.

A Word of Caution Regarding JExtract

Sometimes there could be differences between operating systems regarding the way a data type is internally represented. For example, the sizes of long and long double data types can differ between Linux and Windows, with Linux typically using 8 bytes for long and 16 bytes for long double, while Windows uses 4 bytes for long and 8 bytes for long double. So, you should avoid using Jextract bindings on an operating system that is different to the one where Jextract originally generated these bindings.

Bindings generated on Linux might not work well on Windows.

A Bird’s Eye View of How Interoperability and Jextract Work at the Execution Level.

Implementing interoperability at the execution level works as follows:

Step 1

The first step is to load the shared library (e.g., a .so/.dll/.dylib file), which contains code, variables and further linkages. After loading, the functions which Java wants to call are “looked up” by the SymbolLookup interface, i.e. it retrieves the address of a symbol from one or more libraries.

A symbol is a named entity, such as a function or a global variable. SymbolLookup uses the find(String nameOfTheSymbol) method to search for and return the address of the symbol in the target library. The address of a symbol is modeled as a zero-length memory segment.

Step 2

Once the address is obtained, it is used to either:-

(i) Pass the address to a Linker, which creates a downcall method handle. This handle can then be used to call the foreign function at the segment’s address.

You need to pass the “–enable-native-access=ALL-UNNAMED” switch to the JVM, as the downcall handle is a restricted method of the Java platform.

Java imposes this restriction, as it needs to make sure that you know what you’re doing. You need to call it carefully with the required parameters, so as to not corrupt the memory or crash the JVM.

(ii) Pass this address to another similar downcall method handle, as an argument to the underlying foreign function, to create a chain of methods.

This is done by creating an UpcallStub.

(iii) Store this address inside some other memory segment (allocated via Arena::allocate) to be used later, or to create MemoryLayout instances and access other structs.

(iv) Use it to access a memory region storing some global variable.

Step 3

Using the Jextract-generated wrapper functions, make the actual calls to the native functions.

Execute the native function, monitor the execution for any errors/warnings, and collect the return values.

These return values might need to be reformatted to be used easily by Java.

Overview of Classes Generated by Jextract

initproc.java – Allocates a memory segment using a Foreign Function and creates an upcall stub. It’s also used to invoke the upcall stub . The upcall stub is discussed above.

There are classes that are grouped based on aspects they manage, regarding call dynamics and interoperability. These include:

Python_h_1$_ThreadState_xxx – to manage and report the thread state while Python functions are executed;
Python_h_1$_PyTime_xxx – to manage the clock and timing aspects;
Python_h_1$_PyInterpreterState_xxx – These manage the Python interpreter as Python functions execute;
Function classes like initproc/sendfunc/destructor – Allocates a new upcall stub, models the function pointer signature as a functional Java interface, and invokes the upcall stub with given parameters;
Object classes that act as a bridge between Java and C/Python objects, including ByteArrayObject/FILE/FloatObject/LongObject etc.

The JExtract Tracing Feature

If all this becomes too cryptic, or if you encounter errors, you can use the tracing feature of Jextract. The tracing helps to diagnose bugs/crashes by inspecting the parameters passed to a native call.

You just need to pass the -Djextract.trace.downcalls=true flag as a VM argument to the launcher when you start the application that uses the generated bindings.

A Sample Java Program That Calls a C Language Math Function

Let’s call the sqrt() function of C/C++ language from Java code.

We use Jextract to generate the bindings for mathematical functions in, say, libm.so, and provide math.h for further linking.

jextract -I <path-to-libm.so> -t com.example.mymath math.h

Then we create a simple java program which will call the C math (math.h) Library.

import static org.unix.stdlib_h.*;
import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;

public class SqrtMain {
    public static void main(String[] args) {
       
        int myNumber = 1;
int square_root_myNum = 1;
// Allocate off-heap memory and store myNumber in it.
        // math_h.C_INT is a constant generated by jextract. Memory sufficient to store a C language integer will be allocated.

        try (Arena arena = Arena.ofConfined()) {        
            MemorySegment myNumberSegment = arena.allocateFrom(C_INT, myNumber);           
         
            // Call sqrt
            square_root_myNum = sqrt(myNumberSegment);     
            System.out.println("Square root of" + myNumber + "is = " + square_root_myNum);       
        }
    }
}

Conclusion

JExtract hides some of the underlying details of the Foreign Linker API, and saves you writing boilerplate code. Jextract also simplifies things further, as it parses a Python/C header file to generate a Java class, which presents a simpler Java static method to invoke the underlying Python/C function(s). You need to run JExtract again only if the native libraries or JExtract itself have an updated stable version.

JExtract is an encompassing and versatile tool to speed up your development process and make life easier for you as a modern programmer.

JExtract, the magical tool for interoperability between languages

Importance of Folders in the Jextract Directives

What Does Jextract Do?

Some Switches Used With JExtract

A Word of Caution Regarding JExtract

A Bird’s Eye View of How Interoperability and Jextract Work at the Execution Level.

Step 1

Step 2

Step 3

Overview of Classes Generated by Jextract

The JExtract Tracing Feature

A Sample Java Program That Calls a C Language Math Function

Conclusion

You may also like

Share your Thoughts!Cancel reply

“Production is Secure. Is Production Troubleshooting Secure?’ Webinar

Production is Secure. Is Troubleshooting Process Secure?

Securing Production Troubleshooting with yCrash Audit Logs

About

Popular Topics

Troubleshooting Tools

Importance of Folders in the Jextract Directives

What Does Jextract Do?

Some Switches Used With JExtract

A Word of Caution Regarding JExtract

A Bird’s Eye View of How Interoperability and Jextract Work at the Execution Level.

Step 1

Step 2

Step 3

Overview of Classes Generated by Jextract

The JExtract Tracing Feature

A Sample Java Program That Calls a C Language Math Function

Conclusion

You may also like

Share your Thoughts!Cancel reply

About

Popular Topics

Troubleshooting Tools

Discover more from yCrash