Published:	2025-01-30
Tags:	LLVM

One difficulty when working with LLVM backends is the sheer amount of files, classes and ultimately code that needs to be present in order to get a LLVM backend going. The aim of this post is to give a very brief overview over the necessary classes.

The backend referenced here can be found in my github repository.

The structure on a high level

The Targets directory directly contains all the code that is necessary to go from a target independent representation to a valid machine specific instruction stream.

simple sub directory structure

MCTargetDesc: Contains Classes used to serialize the instruction streams to byte code or human readable assembly code.
Targetinfo: Mainly contains the TargetInfo class which supplies the Target as a singleton and registers the backend with the LLVM framework

Tablegen files

The separation is not uniform but a trend can be detected to split the backend definitions into a couple of files:

File	Content
LEG.td	Top level table-gen file. Include all other table-gen files from here and instantiate only a couple of high level structs
LEGDevices.td	Contains optional architectural features and processor definitions
LEGRegisterInfo.td	Contains all structural information regarding registers. This includes register names, subregister information, and register classes.
LEGInstrFormat.td	Describes instruction formats, and operands.
LEGInstrInfo.td	Describes the individual Instructions, and defines instruction selection patterns
LEGCallingConv.td	Describes the calling convention(s)

Classes in the base directory

The classes in the base directory contain everything necessary for going from the general target independent directed acyclic graph (DAG) to a target specific instruction stream.

Class / file	What it does
LEGAsmPrinter	Assists with outputting human readable assembly
LEGExpandPseudo	Replacing Pseudo instructions (that simplify instruction selection) with valid machine instructions
LEGFrameLowering	Handles function frame lowering - e.g. saving and restoring callee saved registers in prologue and epilogue
LEGInstrInfo	Register copying, spilling and loading. Also handles branch optimization
LEGDAGToDAGISel	Lowering the target independent DAG to an instruction selection. Contains selection functions for complex patterns declared in LEGInstrFormat.td
LEGISelLowering	Sets up the instruction selection taking target specific features and instructions like `divmul`. Also contains the code to lower calls (setting up the arguments etc.), returns (putting return values into the right locations), and addresses
LEGMCInstLower	Translates between the architecture specific Operands and their MC version
LEGRegisterInfo	Assists LLVM in selecting registers as well as lowering operands that access local variables
LEGSubtarget	Holds information about active CPU features
LEGTargetMachine	Registers the backends specific optimization passes with LLVM

Classes in MCTargetDesc

These classes pick up where those in the base directory left of. Here we already start with instruction for our architecture and the goal is to serialize them either as human readable assembly or machine executable binary code.

Class	What it does
LEGAsmBackend	Handles fixups. These are values that are not known at the time the instruction is selected and thus need to be fixed up. Instruction pointer relative operands are among those values needing fixups.
LEGELFObjectWriter	Translates fixups into relocations
LEGInstPrinter	Prints instructions and operands as human readable assembly
LEGMCCodeEmitter	Encodes instructions and operands into byte code
LEGTargetDesc	Registers the machine Code generating aspects of the backend