Skip to main content

cranelift_codegen/machinst/
buffer.rs

1//! In-memory representation of compiled machine code, with labels and fixups to
2//! refer to those labels. Handles constant-pool island insertion and also
3//! veneer insertion for out-of-range jumps.
4//!
5//! This code exists to solve three problems:
6//!
7//! - Branch targets for forward branches are not known until later, when we
8//!   emit code in a single pass through the instruction structs.
9//!
10//! - On many architectures, address references or offsets have limited range.
11//!   For example, on AArch64, conditional branches can only target code +/- 1MB
12//!   from the branch itself.
13//!
14//! - The lowering of control flow from the CFG-with-edges produced by
15//!   [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty
16//!   edge blocks when the register allocator does not need to insert any
17//!   spills/reloads/moves in edge blocks, results in many suboptimal branch
18//!   patterns. The lowering also pays no attention to block order, and so
19//!   two-target conditional forms (cond-br followed by uncond-br) can often by
20//!   avoided because one of the targets is the fallthrough. There are several
21//!   cases here where we can simplify to use fewer branches.
22//!
23//! This "buffer" implements a single-pass code emission strategy (with a later
24//! "fixup" pass, but only through recorded fixups, not all instructions). The
25//! basic idea is:
26//!
27//! - Emit branches as they are, including two-target (cond/uncond) compound
28//!   forms, but with zero offsets and optimistically assuming the target will be
29//!   in range. Record the "fixup" for later. Targets are denoted instead by
30//!   symbolic "labels" that are then bound to certain offsets in the buffer as
31//!   we emit code. (Nominally, there is a label at the start of every basic
32//!   block.)
33//!
34//! - As we do this, track the offset in the buffer at which the first label
35//!   reference "goes out of range". We call this the "deadline". If we reach the
36//!   deadline and we still have not bound the label to which an unresolved branch
37//!   refers, we have a problem!
38//!
39//! - To solve this problem, we emit "islands" full of "veneers". An island is
40//!   simply a chunk of code inserted in the middle of the code actually produced
41//!   by the emitter (e.g., vcode iterating over instruction structs). The emitter
42//!   has some awareness of this: it either asks for an island between blocks, so
43//!   it is not accidentally executed, or else it emits a branch around the island
44//!   when all other options fail (see `Inst::EmitIsland` meta-instruction).
45//!
46//! - A "veneer" is an instruction (or sequence of instructions) in an "island"
47//!   that implements a longer-range reference to a label. The idea is that, for
48//!   example, a branch with a limited range can branch to a "veneer" instead,
49//!   which is simply a branch in a form that can use a longer-range reference. On
50//!   AArch64, for example, conditionals have a +/- 1 MB range, but a conditional
51//!   can branch to an unconditional branch which has a +/- 128 MB range. Hence, a
52//!   conditional branch's label reference can be fixed up with a "veneer" to
53//!   achieve a longer range.
54//!
55//! - To implement all of this, we require the backend to provide a `LabelUse`
56//!   type that implements a trait. This is nominally an enum that records one of
57//!   several kinds of references to an offset in code -- basically, a relocation
58//!   type -- and will usually correspond to different instruction formats. The
59//!   `LabelUse` implementation specifies the maximum range, how to patch in the
60//!   actual label location when known, and how to generate a veneer to extend the
61//!   range.
62//!
63//! That satisfies label references, but we still may have suboptimal branch
64//! patterns. To clean up the branches, we do a simple "peephole"-style
65//! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`)
66//! informs the buffer of branches in the code and, in the case of conditionals,
67//! the code that would have been emitted to invert this branch's condition. We
68//! track the "latest branches": these are branches that are contiguous up to
69//! the current offset. (If any code is emitted after a branch, that branch or
70//! run of contiguous branches is no longer "latest".) The latest branches are
71//! those that we can edit by simply truncating the buffer and doing something
72//! else instead.
73//!
74//! To optimize branches, we implement several simple rules, and try to apply
75//! them to the "latest branches" when possible:
76//!
77//! - A branch with a label target, when that label is bound to the ending
78//!   offset of the branch (the fallthrough location), can be removed altogether,
79//!   because the branch would have no effect).
80//!
81//! - An unconditional branch that starts at a label location, and branches to
82//!   another label, results in a "label alias": all references to the label bound
83//!   *to* this branch instruction are instead resolved to the *target* of the
84//!   branch instruction. This effectively removes empty blocks that just
85//!   unconditionally branch to the next block. We call this "branch threading".
86//!
87//! - A conditional followed by an unconditional, when the conditional branches
88//!   to the unconditional's fallthrough, results in (i) the truncation of the
89//!   unconditional, (ii) the inversion of the condition's condition, and (iii)
90//!   replacement of the conditional's target (using the original target of the
91//!   unconditional). This is a fancy way of saying "we can flip a two-target
92//!   conditional branch's taken/not-taken targets if it works better with our
93//!   fallthrough". To make this work, the emitter actually gives the buffer
94//!   *both* forms of every conditional branch: the true form is emitted into the
95//!   buffer, and the "inverted" machine-code bytes are provided as part of the
96//!   branch-fixup metadata.
97//!
98//! - An unconditional B preceded by another unconditional P, when B's label(s) have
99//!   been redirected to target(B), can be removed entirely. This is an extension
100//!   of the branch-threading optimization, and is valid because if we know there
101//!   will be no fallthrough into this branch instruction (the prior instruction
102//!   is an unconditional jump), and if we know we have successfully redirected
103//!   all labels, then this branch instruction is unreachable. Note that this
104//!   works because the redirection happens before the label is ever resolved
105//!   (fixups happen at island emission time, at which point latest-branches are
106//!   cleared, or at the end of emission), so we are sure to catch and redirect
107//!   all possible paths to this instruction.
108//!
109//! # Branch-optimization Correctness
110//!
111//! The branch-optimization mechanism depends on a few data structures with
112//! invariants, which are always held outside the scope of top-level public
113//! methods:
114//!
115//! - The latest-branches list. Each entry describes a span of the buffer
116//!   (start/end offsets), the label target, the corresponding fixup-list entry
117//!   index, and the bytes (must be the same length) for the inverted form, if
118//!   conditional. The list of labels that are bound to the start-offset of this
119//!   branch is *complete* (if any label has a resolved offset equal to `start`
120//!   and is not an alias, it must appear in this list) and *precise* (no label
121//!   in this list can be bound to another offset). No label in this list should
122//!   be an alias.  No two branch ranges can overlap, and branches are in
123//!   ascending-offset order.
124//!
125//! - The labels-at-tail list. This contains all MachLabels that have been bound
126//!   to (whose resolved offsets are equal to) the tail offset of the buffer.
127//!   No label in this list should be an alias.
128//!
129//! - The label_offsets array, containing the bound offset of a label or
130//!   UNKNOWN. No label can be bound at an offset greater than the current
131//!   buffer tail.
132//!
133//! - The label_aliases array, containing another label to which a label is
134//!   bound or UNKNOWN. A label's resolved offset is the resolved offset
135//!   of the label it is aliased to, if this is set.
136//!
137//! We argue below, at each method, how the invariants in these data structures
138//! are maintained (grep for "Post-invariant").
139//!
140//! Given these invariants, we argue why each optimization preserves execution
141//! semantics below (grep for "Preserves execution semantics").
142//!
143//! # Avoiding Quadratic Behavior
144//!
145//! There are two cases where we've had to take some care to avoid
146//! quadratic worst-case behavior:
147//!
148//! - The "labels at this branch" list can grow unboundedly if the
149//!   code generator binds many labels at one location. If the count
150//!   gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we
151//!   simply abort an optimization early in a way that is always correct
152//!   but is conservative.
153//!
154//! - The fixup list can interact with island emission to create
155//!   "quadratic island behavior". In a little more detail, one can hit
156//!   this behavior by having some pending fixups (forward label
157//!   references) with long-range label-use kinds, and some others
158//!   with shorter-range references that nonetheless still are pending
159//!   long enough to trigger island generation. In such a case, we
160//!   process the fixup list, generate veneers to extend some forward
161//!   references' ranges, but leave the other (longer-range) ones
162//!   alone. The way this was implemented put them back on a list and
163//!   resulted in quadratic behavior.
164//!
165//!   To avoid this fixups are split into two lists: one "pending" list and one
166//!   final list. The pending list is kept around for handling fixups related to
167//!   branches so it can be edited/truncated. When an island is reached, which
168//!   starts processing fixups, all pending fixups are flushed into the final
169//!   list. The final list is a `BinaryHeap` which enables fixup processing to
170//!   only process those which are required during island emission, deferring
171//!   all longer-range fixups to later.
172
173use crate::binemit::{Addend, CodeOffset, Reloc};
174use crate::ir::function::FunctionParameters;
175use crate::ir::{DebugTag, ExceptionTag, ExternalName, RelSourceLoc, SourceLoc, TrapCode};
176use crate::isa::unwind::UnwindInst;
177use crate::machinst::{
178    BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst,
179};
180use crate::trace;
181use crate::{MachInstEmitState, ir};
182use crate::{VCodeConstantData, timing};
183use alloc::collections::BinaryHeap;
184use alloc::string::String;
185use alloc::vec::Vec;
186use core::cmp::Ordering;
187use core::mem;
188use core::ops::Range;
189use cranelift_control::ControlPlane;
190use cranelift_entity::{PrimaryMap, SecondaryMap, entity_impl};
191use smallvec::SmallVec;
192
193#[cfg(feature = "enable-serde")]
194use serde::{Deserialize, Serialize};
195
196#[cfg(feature = "enable-serde")]
197pub trait CompilePhase {
198    type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
199    type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
200}
201
202#[cfg(not(feature = "enable-serde"))]
203pub trait CompilePhase {
204    type MachSrcLocType: core::fmt::Debug + PartialEq + Clone;
205    type SourceLocType: core::fmt::Debug + PartialEq + Clone;
206}
207
208/// Status of a compiled artifact that needs patching before being used.
209#[derive(Clone, Debug, PartialEq)]
210#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
211pub struct Stencil;
212
213/// Status of a compiled artifact ready to use.
214#[derive(Clone, Debug, PartialEq)]
215pub struct Final;
216
217impl CompilePhase for Stencil {
218    type MachSrcLocType = MachSrcLoc<Stencil>;
219    type SourceLocType = RelSourceLoc;
220}
221
222impl CompilePhase for Final {
223    type MachSrcLocType = MachSrcLoc<Final>;
224    type SourceLocType = SourceLoc;
225}
226
227#[derive(Clone, Copy, Debug, PartialEq, Eq)]
228enum ForceVeneers {
229    Yes,
230    No,
231}
232
233/// A buffer of output to be produced, fixed up, and then emitted to a CodeSink
234/// in bulk.
235///
236/// This struct uses `SmallVec`s to support small-ish function bodies without
237/// any heap allocation. As such, it will be several kilobytes large. This is
238/// likely fine as long as it is stack-allocated for function emission then
239/// thrown away; but beware if many buffer objects are retained persistently.
240pub struct MachBuffer<I: VCodeInst> {
241    /// The buffer contents, as raw bytes.
242    data: SmallVec<[u8; 1024]>,
243    /// The required alignment of this buffer.
244    min_alignment: u32,
245    /// Any relocations referring to this code. Note that only *external*
246    /// relocations are tracked here; references to labels within the buffer are
247    /// resolved before emission.
248    relocs: SmallVec<[MachReloc; 16]>,
249    /// Any trap records referring to this code.
250    traps: SmallVec<[MachTrap; 16]>,
251    /// Any call site records referring to this code.
252    call_sites: SmallVec<[MachCallSite; 16]>,
253    /// Any patchable call site locations.
254    patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>,
255    /// Any exception-handler records referred to at call sites.
256    exception_handlers: SmallVec<[MachExceptionHandler; 16]>,
257    /// Any source location mappings referring to this code.
258    srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>,
259    /// Any debug tags referring to this code.
260    debug_tags: Vec<MachDebugTags>,
261    /// Pool of debug tags referenced by `MachDebugTags` entries.
262    debug_tag_pool: Vec<DebugTag>,
263    /// Any user stack maps for this code.
264    ///
265    /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
266    /// by code offset, and each stack map covers `span` bytes on the stack.
267    user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
268    /// Any unwind info at a given location.
269    unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
270    /// The current source location in progress (after `start_srcloc()` and
271    /// before `end_srcloc()`).  This is a (start_offset, src_loc) tuple.
272    cur_srcloc: Option<(CodeOffset, RelSourceLoc)>,
273    /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown.
274    label_offsets: SmallVec<[CodeOffset; 16]>,
275    /// Label aliases: when one label points to an unconditional jump, and that
276    /// jump points to another label, we can redirect references to the first
277    /// label immediately to the second.
278    ///
279    /// Invariant: we don't have label-alias cycles. We ensure this by,
280    /// before setting label A to alias label B, resolving B's alias
281    /// target (iteratively until a non-aliased label); if B is already
282    /// aliased to A, then we cannot alias A back to B.
283    label_aliases: SmallVec<[MachLabel; 16]>,
284    /// Constants that must be emitted at some point.
285    pending_constants: SmallVec<[VCodeConstant; 16]>,
286    /// Byte size of all constants in `pending_constants`.
287    pending_constants_size: CodeOffset,
288    /// Traps that must be emitted at some point.
289    pending_traps: SmallVec<[MachLabelTrap; 16]>,
290    /// Fixups that haven't yet been flushed into `fixup_records` below and may
291    /// be related to branches that are chomped. These all get added to
292    /// `fixup_records` during island emission.
293    pending_fixup_records: SmallVec<[MachLabelFixup<I>; 16]>,
294    /// The nearest upcoming deadline for entries in `pending_fixup_records`.
295    pending_fixup_deadline: CodeOffset,
296    /// Fixups that must be performed after all code is emitted.
297    fixup_records: BinaryHeap<MachLabelFixup<I>>,
298    /// Latest branches, to facilitate in-place editing for better fallthrough
299    /// behavior and empty-block removal.
300    latest_branches: SmallVec<[MachBranch; 4]>,
301    /// All labels at the current offset (emission tail). This is lazily
302    /// cleared: it is actually accurate as long as the current offset is
303    /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should
304    /// be considered as empty.
305    ///
306    /// For correctness, this *must* be complete (i.e., the vector must contain
307    /// all labels whose offsets are resolved to the current tail), because we
308    /// rely on it to update labels when we truncate branches.
309    labels_at_tail: SmallVec<[MachLabel; 4]>,
310    /// The last offset at which `labels_at_tail` is valid. It is conceptually
311    /// always describing the tail of the buffer, but we do not clear
312    /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it
313    /// when the offset has grown past this (`labels_at_tail_off`) point.
314    /// Always <= `cur_offset()`.
315    labels_at_tail_off: CodeOffset,
316    /// Metadata about all constants that this function has access to.
317    ///
318    /// This records the size/alignment of all constants (not the actual data)
319    /// along with the last available label generated for the constant. This map
320    /// is consulted when constants are referred to and the label assigned to a
321    /// constant may change over time as well.
322    constants: PrimaryMap<VCodeConstant, MachBufferConstant>,
323    /// All recorded usages of constants as pairs of the constant and where the
324    /// constant needs to be placed within `self.data`. Note that the same
325    /// constant may appear in this array multiple times if it was emitted
326    /// multiple times.
327    used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>,
328    /// Indicates when a patchable region is currently open, to guard that it's
329    /// not possible to nest patchable regions.
330    open_patchable: bool,
331    /// Stack frame layout metadata. If provided for a MachBuffer
332    /// containing a function body, this allows interpretation of
333    /// runtime state given a view of an active stack frame.
334    frame_layout: Option<MachBufferFrameLayout>,
335}
336
337impl MachBufferFinalized<Stencil> {
338    /// Get a finalized machine buffer by applying the function's base source location.
339    pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> {
340        MachBufferFinalized {
341            data: self.data,
342            relocs: self.relocs,
343            traps: self.traps,
344            call_sites: self.call_sites,
345            patchable_call_sites: self.patchable_call_sites,
346            exception_handlers: self.exception_handlers,
347            srclocs: self
348                .srclocs
349                .into_iter()
350                .map(|srcloc| srcloc.apply_base_srcloc(base_srcloc))
351                .collect(),
352            debug_tags: self.debug_tags,
353            debug_tag_pool: self.debug_tag_pool,
354            user_stack_maps: self.user_stack_maps,
355            unwind_info: self.unwind_info,
356            alignment: self.alignment,
357            frame_layout: self.frame_layout,
358            nop_units: self.nop_units,
359        }
360    }
361}
362
363/// A `MachBuffer` once emission is completed: holds generated code and records,
364/// without fixups. This allows the type to be independent of the backend.
365#[derive(PartialEq, Debug, Clone)]
366#[cfg_attr(
367    feature = "enable-serde",
368    derive(serde_derive::Serialize, serde_derive::Deserialize)
369)]
370pub struct MachBufferFinalized<T: CompilePhase> {
371    /// The buffer contents, as raw bytes.
372    pub(crate) data: SmallVec<[u8; 1024]>,
373    /// Any relocations referring to this code. Note that only *external*
374    /// relocations are tracked here; references to labels within the buffer are
375    /// resolved before emission.
376    pub(crate) relocs: SmallVec<[FinalizedMachReloc; 16]>,
377    /// Any trap records referring to this code.
378    pub(crate) traps: SmallVec<[MachTrap; 16]>,
379    /// Any call site records referring to this code.
380    pub(crate) call_sites: SmallVec<[MachCallSite; 16]>,
381    /// Any patchable call site locations referring to this code.
382    pub(crate) patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>,
383    /// Any exception-handler records referred to at call sites.
384    pub(crate) exception_handlers: SmallVec<[FinalizedMachExceptionHandler; 16]>,
385    /// Any source location mappings referring to this code.
386    pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>,
387    /// Any debug tags referring to this code.
388    pub(crate) debug_tags: Vec<MachDebugTags>,
389    /// Pool of debug tags referenced by `MachDebugTags` entries.
390    pub(crate) debug_tag_pool: Vec<DebugTag>,
391    /// Any user stack maps for this code.
392    ///
393    /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
394    /// by code offset, and each stack map covers `span` bytes on the stack.
395    pub(crate) user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
396    /// Stack frame layout metadata. If provided for a MachBuffer
397    /// containing a function body, this allows interpretation of
398    /// runtime state given a view of an active stack frame.
399    pub(crate) frame_layout: Option<MachBufferFrameLayout>,
400    /// Any unwind info at a given location.
401    pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
402    /// The required alignment of this buffer.
403    pub alignment: u32,
404    /// The means by which to NOP out patchable call sites.
405    ///
406    /// This allows a consumer of a `MachBufferFinalized` to disable
407    /// patchable call sites (which are enabled by default) without
408    /// specific knowledge of the target ISA.
409    ///
410    /// Each entry is one form of nop, and these are required to be
411    /// sorted in ascending-size order.
412    pub nop_units: Vec<Vec<u8>>,
413}
414
415const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff;
416const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff);
417
418/// Threshold on max length of `labels_at_this_branch` list to avoid
419/// unbounded quadratic behavior (see comment below at use-site).
420const LABEL_LIST_THRESHOLD: usize = 100;
421
422/// A label refers to some offset in a `MachBuffer`. It may not be resolved at
423/// the point at which it is used by emitted code; the buffer records "fixups"
424/// for references to the label, and will come back and patch the code
425/// appropriately when the label's location is eventually known.
426#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
427pub struct MachLabel(u32);
428entity_impl!(MachLabel);
429
430impl MachLabel {
431    /// Get a label for a block. (The first N MachLabels are always reserved for
432    /// the N blocks in the vcode.)
433    pub fn from_block(bindex: BlockIndex) -> MachLabel {
434        MachLabel(bindex.index() as u32)
435    }
436
437    /// Creates a string representing this label, for convenience.
438    pub fn to_string(&self) -> String {
439        format!("label{}", self.0)
440    }
441}
442
443impl Default for MachLabel {
444    fn default() -> Self {
445        UNKNOWN_LABEL
446    }
447}
448
449/// Represents the beginning of an editable region in the [`MachBuffer`], while code emission is
450/// still occurring. An [`OpenPatchRegion`] is closed by [`MachBuffer::end_patchable`], consuming
451/// the [`OpenPatchRegion`] token in the process.
452pub struct OpenPatchRegion(usize);
453
454/// A region in the [`MachBuffer`] code buffer that can be edited prior to finalization. An example
455/// of where you might want to use this is for patching instructions that mention constants that
456/// won't be known until later: [`MachBuffer::start_patchable`] can be used to begin the patchable
457/// region, instructions can be emitted with placeholder constants, and the [`PatchRegion`] token
458/// can be produced by [`MachBuffer::end_patchable`]. Once the values of those constants are known,
459/// the [`PatchRegion::patch`] function can be used to get a mutable buffer to the instruction
460/// bytes, and the constants uses can be updated directly.
461pub struct PatchRegion {
462    range: Range<usize>,
463}
464
465impl PatchRegion {
466    /// Consume the patch region to yield a mutable slice of the [`MachBuffer`] data buffer.
467    pub fn patch<I: VCodeInst>(self, buffer: &mut MachBuffer<I>) -> &mut [u8] {
468        &mut buffer.data[self.range]
469    }
470}
471
472impl<I: VCodeInst> MachBuffer<I> {
473    /// Create a new section, known to start at `start_offset` and with a size limited to
474    /// `length_limit`.
475    pub fn new() -> MachBuffer<I> {
476        MachBuffer {
477            data: SmallVec::new(),
478            min_alignment: I::function_alignment().minimum,
479            relocs: SmallVec::new(),
480            traps: SmallVec::new(),
481            call_sites: SmallVec::new(),
482            patchable_call_sites: SmallVec::new(),
483            exception_handlers: SmallVec::new(),
484            srclocs: SmallVec::new(),
485            debug_tags: vec![],
486            debug_tag_pool: vec![],
487            user_stack_maps: SmallVec::new(),
488            unwind_info: SmallVec::new(),
489            cur_srcloc: None,
490            label_offsets: SmallVec::new(),
491            label_aliases: SmallVec::new(),
492            pending_constants: SmallVec::new(),
493            pending_constants_size: 0,
494            pending_traps: SmallVec::new(),
495            pending_fixup_records: SmallVec::new(),
496            pending_fixup_deadline: u32::MAX,
497            fixup_records: Default::default(),
498            latest_branches: SmallVec::new(),
499            labels_at_tail: SmallVec::new(),
500            labels_at_tail_off: 0,
501            constants: Default::default(),
502            used_constants: Default::default(),
503            open_patchable: false,
504            frame_layout: None,
505        }
506    }
507
508    /// Current offset from start of buffer.
509    pub fn cur_offset(&self) -> CodeOffset {
510        self.data.len() as CodeOffset
511    }
512
513    /// Add a byte.
514    pub fn put1(&mut self, value: u8) {
515        self.data.push(value);
516
517        // Post-invariant: conceptual-labels_at_tail contains a complete and
518        // precise list of labels bound at `cur_offset()`. We have advanced
519        // `cur_offset()`, hence if it had been equal to `labels_at_tail_off`
520        // before, it is not anymore (and it cannot become equal, because
521        // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is
522        // conceptually empty (even though it is only lazily cleared). No labels
523        // can be bound at this new offset (by invariant on `label_offsets`).
524        // Hence the invariant holds.
525    }
526
527    /// Add 2 bytes.
528    pub fn put2(&mut self, value: u16) {
529        let bytes = value.to_le_bytes();
530        self.data.extend_from_slice(&bytes[..]);
531
532        // Post-invariant: as for `put1()`.
533    }
534
535    /// Add 4 bytes.
536    pub fn put4(&mut self, value: u32) {
537        let bytes = value.to_le_bytes();
538        self.data.extend_from_slice(&bytes[..]);
539
540        // Post-invariant: as for `put1()`.
541    }
542
543    /// Add 8 bytes.
544    pub fn put8(&mut self, value: u64) {
545        let bytes = value.to_le_bytes();
546        self.data.extend_from_slice(&bytes[..]);
547
548        // Post-invariant: as for `put1()`.
549    }
550
551    /// Add a slice of bytes.
552    pub fn put_data(&mut self, data: &[u8]) {
553        self.data.extend_from_slice(data);
554
555        // Post-invariant: as for `put1()`.
556    }
557
558    /// Reserve appended space and return a mutable slice referring to it.
559    pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] {
560        let off = self.data.len();
561        let new_len = self.data.len() + len;
562        self.data.resize(new_len, 0);
563        &mut self.data[off..]
564
565        // Post-invariant: as for `put1()`.
566    }
567
568    /// Align up to the given alignment.
569    pub fn align_to(&mut self, align_to: CodeOffset) {
570        trace!("MachBuffer: align to {}", align_to);
571        assert!(
572            align_to.is_power_of_two(),
573            "{align_to} is not a power of two"
574        );
575        while self.cur_offset() & (align_to - 1) != 0 {
576            self.put1(0);
577        }
578
579        // Post-invariant: as for `put1()`.
580    }
581
582    /// Begin a region of patchable code. There is one requirement for the
583    /// code that is emitted: It must not introduce any instructions that
584    /// could be chomped (branches are an example of this). In other words,
585    /// you must not call [`MachBuffer::add_cond_branch`] or
586    /// [`MachBuffer::add_uncond_branch`] between calls to this method and
587    /// [`MachBuffer::end_patchable`].
588    pub fn start_patchable(&mut self) -> OpenPatchRegion {
589        assert!(!self.open_patchable, "Patchable regions may not be nested");
590        self.open_patchable = true;
591        OpenPatchRegion(usize::try_from(self.cur_offset()).unwrap())
592    }
593
594    /// End a region of patchable code, yielding a [`PatchRegion`] value that
595    /// can be consumed later to produce a one-off mutable slice to the
596    /// associated region of the data buffer.
597    pub fn end_patchable(&mut self, open: OpenPatchRegion) -> PatchRegion {
598        // No need to assert the state of `open_patchable` here, as we take
599        // ownership of the only `OpenPatchable` value.
600        self.open_patchable = false;
601        let end = usize::try_from(self.cur_offset()).unwrap();
602        PatchRegion { range: open.0..end }
603    }
604
605    /// Allocate a `Label` to refer to some offset. May not be bound to a fixed
606    /// offset yet.
607    pub fn get_label(&mut self) -> MachLabel {
608        let l = self.label_offsets.len() as u32;
609        self.label_offsets.push(UNKNOWN_LABEL_OFFSET);
610        self.label_aliases.push(UNKNOWN_LABEL);
611        trace!("MachBuffer: new label -> {:?}", MachLabel(l));
612        MachLabel(l)
613
614        // Post-invariant: the only mutation is to add a new label; it has no
615        // bound offset yet, so it trivially satisfies all invariants.
616    }
617
618    /// Reserve the first N MachLabels for blocks.
619    pub fn reserve_labels_for_blocks(&mut self, blocks: usize) {
620        trace!("MachBuffer: first {} labels are for blocks", blocks);
621        debug_assert!(self.label_offsets.is_empty());
622        self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET);
623        self.label_aliases.resize(blocks, UNKNOWN_LABEL);
624
625        // Post-invariant: as for `get_label()`.
626    }
627
628    /// Registers metadata in this `MachBuffer` about the `constants` provided.
629    ///
630    /// This will record the size/alignment of all constants which will prepare
631    /// them for emission later on.
632    pub fn register_constants(&mut self, constants: &VCodeConstants) {
633        for (c, val) in constants.iter() {
634            self.register_constant(&c, val);
635        }
636    }
637
638    /// Similar to [`MachBuffer::register_constants`] but registers a
639    /// single constant metadata. This function is useful in
640    /// situations where not all constants are known at the time of
641    /// emission.
642    pub fn register_constant(&mut self, constant: &VCodeConstant, data: &VCodeConstantData) {
643        let c2 = self.constants.push(MachBufferConstant {
644            upcoming_label: None,
645            align: data.alignment(),
646            size: data.as_slice().len(),
647        });
648        assert_eq!(*constant, c2);
649    }
650
651    /// Completes constant emission by iterating over `self.used_constants` and
652    /// filling in the "holes" with the constant values provided by `constants`.
653    ///
654    /// Returns the alignment required for this entire buffer. Alignment starts
655    /// at the ISA's minimum function alignment and can be increased due to
656    /// constant requirements.
657    fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 {
658        let mut alignment = self.min_alignment;
659        for (constant, offset) in mem::take(&mut self.used_constants) {
660            let constant = constants.get(constant);
661            let data = constant.as_slice();
662            self.data[offset as usize..][..data.len()].copy_from_slice(data);
663            alignment = constant.alignment().max(alignment);
664        }
665        alignment
666    }
667
668    /// Returns a label that can be used to refer to the `constant` provided.
669    ///
670    /// This will automatically defer a new constant to be emitted for
671    /// `constant` if it has not been previously emitted. Note that this
672    /// function may return a different label for the same constant at
673    /// different points in time. The label is valid to use only from the
674    /// current location; the MachBuffer takes care to emit the same constant
675    /// multiple times if needed so the constant is always in range.
676    pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel {
677        let MachBufferConstant {
678            align,
679            size,
680            upcoming_label,
681        } = self.constants[constant];
682        if let Some(label) = upcoming_label {
683            return label;
684        }
685
686        let label = self.get_label();
687        trace!(
688            "defer constant: eventually emit {size} bytes aligned \
689             to {align} at label {label:?}",
690        );
691        self.pending_constants.push(constant);
692        self.pending_constants_size += size as u32;
693        self.constants[constant].upcoming_label = Some(label);
694        label
695    }
696
697    /// Bind a label to the current offset. A label can only be bound once.
698    pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) {
699        trace!(
700            "MachBuffer: bind label {:?} at offset {}",
701            label,
702            self.cur_offset()
703        );
704        debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET);
705        debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL);
706        let offset = self.cur_offset();
707        self.label_offsets[label.0 as usize] = offset;
708        self.lazily_clear_labels_at_tail();
709        self.labels_at_tail.push(label);
710
711        // Invariants hold: bound offset of label is <= cur_offset (in fact it
712        // is equal). If the `labels_at_tail` list was complete and precise
713        // before, it is still, because we have bound this label to the current
714        // offset and added it to the list (which contains all labels at the
715        // current offset).
716
717        self.optimize_branches(ctrl_plane);
718
719        // Post-invariant: by `optimize_branches()` (see argument there).
720    }
721
722    /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the
723    /// offset that it applies to.
724    fn lazily_clear_labels_at_tail(&mut self) {
725        let offset = self.cur_offset();
726        if offset > self.labels_at_tail_off {
727            self.labels_at_tail_off = offset;
728            self.labels_at_tail.clear();
729        }
730
731        // Post-invariant: either labels_at_tail_off was at cur_offset, and
732        // state is untouched, or was less than cur_offset, in which case the
733        // labels_at_tail list was conceptually empty, and is now actually
734        // empty.
735    }
736
737    /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`.
738    pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset {
739        let mut iters = 0;
740        while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL {
741            label = self.label_aliases[label.0 as usize];
742            // To protect against an infinite loop (despite our assurances to
743            // ourselves that the invariants make this impossible), assert out
744            // after 1M iterations. The number of basic blocks is limited
745            // in most contexts anyway so this should be impossible to hit with
746            // a legitimate input.
747            iters += 1;
748            assert!(iters < 1_000_000, "Unexpected cycle in label aliases");
749        }
750        self.label_offsets[label.0 as usize]
751
752        // Post-invariant: no mutations.
753    }
754
755    /// Emit a reference to the given label with the given reference type (i.e.,
756    /// branch-instruction format) at the current offset.  This is like a
757    /// relocation, but handled internally.
758    ///
759    /// This can be called before the branch is actually emitted; fixups will
760    /// not happen until an island is emitted or the buffer is finished.
761    pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) {
762        trace!(
763            "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}",
764            offset, label, kind
765        );
766
767        // Add the fixup, and update the worst-case island size based on a
768        // veneer for this label use.
769        let fixup = MachLabelFixup {
770            label,
771            offset,
772            kind,
773        };
774        self.pending_fixup_deadline = self
775            .pending_fixup_deadline
776            // Subtract one alignment here to the deadline to account for
777            // extra space taken by aligning an island.
778            .min(fixup.deadline() - I::LabelUse::ALIGN);
779        trace!("pending_fixup_deadline = {}", self.pending_fixup_deadline);
780        self.pending_fixup_records.push(fixup);
781
782        // Post-invariant: no mutations to branches/labels data structures.
783    }
784
785    /// Inform the buffer of an unconditional branch at the given offset,
786    /// targeting the given label. May be used to optimize branches.
787    /// The last added label-use must correspond to this branch.
788    /// This must be called when the current offset is equal to `start`; i.e.,
789    /// before actually emitting the branch. This implies that for a branch that
790    /// uses a label and is eligible for optimizations by the MachBuffer, the
791    /// proper sequence is:
792    ///
793    /// - Call `use_label_at_offset()` to emit the fixup record.
794    /// - Call `add_uncond_branch()` to make note of the branch.
795    /// - Emit the bytes for the branch's machine code.
796    ///
797    /// Additional requirement: no labels may be bound between `start` and `end`
798    /// (exclusive on both ends).
799    pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) {
800        debug_assert!(
801            !self.open_patchable,
802            "Branch instruction inserted within a patchable region"
803        );
804        assert!(self.cur_offset() == start);
805        debug_assert!(end > start);
806        assert!(!self.pending_fixup_records.is_empty());
807        let fixup = self.pending_fixup_records.len() - 1;
808        self.lazily_clear_labels_at_tail();
809        self.latest_branches.push(MachBranch {
810            start,
811            end,
812            target,
813            fixup,
814            inverted: None,
815            labels_at_this_branch: self.labels_at_tail.clone(),
816        });
817
818        // Post-invariant: we asserted branch start is current tail; the list of
819        // labels at branch is cloned from list of labels at current tail.
820    }
821
822    /// Inform the buffer of a conditional branch at the given offset,
823    /// targeting the given label. May be used to optimize branches.
824    /// The last added label-use must correspond to this branch.
825    ///
826    /// Additional requirement: no labels may be bound between `start` and `end`
827    /// (exclusive on both ends).
828    pub fn add_cond_branch(
829        &mut self,
830        start: CodeOffset,
831        end: CodeOffset,
832        target: MachLabel,
833        inverted: &[u8],
834    ) {
835        debug_assert!(
836            !self.open_patchable,
837            "Branch instruction inserted within a patchable region"
838        );
839        assert!(self.cur_offset() == start);
840        debug_assert!(end > start);
841        assert!(!self.pending_fixup_records.is_empty());
842        debug_assert!(
843            inverted.len() == (end - start) as usize,
844            "branch length = {}, but inverted length = {}",
845            end - start,
846            inverted.len()
847        );
848        let fixup = self.pending_fixup_records.len() - 1;
849        let inverted = Some(SmallVec::from(inverted));
850        self.lazily_clear_labels_at_tail();
851        self.latest_branches.push(MachBranch {
852            start,
853            end,
854            target,
855            fixup,
856            inverted,
857            labels_at_this_branch: self.labels_at_tail.clone(),
858        });
859
860        // Post-invariant: we asserted branch start is current tail; labels at
861        // branch list is cloned from list of labels at current tail.
862    }
863
864    fn truncate_last_branch(&mut self) {
865        debug_assert!(
866            !self.open_patchable,
867            "Branch instruction truncated within a patchable region"
868        );
869
870        self.lazily_clear_labels_at_tail();
871        // Invariants hold at this point.
872
873        let b = self.latest_branches.pop().unwrap();
874        assert!(b.end == self.cur_offset());
875
876        // State:
877        //    [PRE CODE]
878        //  Offset b.start, b.labels_at_this_branch:
879        //    [BRANCH CODE]
880        //  cur_off, self.labels_at_tail -->
881        //    (end of buffer)
882        self.data.truncate(b.start as usize);
883        self.pending_fixup_records.truncate(b.fixup);
884
885        // Trim srclocs and debug tags now past the end of the buffer.
886        while let Some(last_srcloc) = self.srclocs.last_mut() {
887            if last_srcloc.end <= b.start {
888                break;
889            }
890            if last_srcloc.start < b.start {
891                last_srcloc.end = b.start;
892                break;
893            }
894            self.srclocs.pop();
895        }
896        while let Some(last_debug_tag) = self.debug_tags.last() {
897            if last_debug_tag.offset <= b.start {
898                break;
899            }
900            self.debug_tags.pop();
901        }
902
903        // State:
904        //    [PRE CODE]
905        //  cur_off, Offset b.start, b.labels_at_this_branch:
906        //    (end of buffer)
907        //
908        //  self.labels_at_tail -->  (past end of buffer)
909        let cur_off = self.cur_offset();
910        self.labels_at_tail_off = cur_off;
911        // State:
912        //    [PRE CODE]
913        //  cur_off, Offset b.start, b.labels_at_this_branch,
914        //  self.labels_at_tail:
915        //    (end of buffer)
916        //
917        // resolve_label_offset(l) for l in labels_at_tail:
918        //    (past end of buffer)
919
920        trace!(
921            "truncate_last_branch: truncated {:?}; off now {}",
922            b, cur_off
923        );
924
925        // Fix up resolved label offsets for labels at tail.
926        for &l in &self.labels_at_tail {
927            self.label_offsets[l.0 as usize] = cur_off;
928        }
929        // Old labels_at_this_branch are now at cur_off.
930        self.labels_at_tail.extend(b.labels_at_this_branch);
931
932        // Post-invariant: this operation is defined to truncate the buffer,
933        // which moves cur_off backward, and to move labels at the end of the
934        // buffer back to the start-of-branch offset.
935        //
936        // latest_branches satisfies all invariants:
937        // - it has no branches past the end of the buffer (branches are in
938        //   order, we removed the last one, and we truncated the buffer to just
939        //   before the start of that branch)
940        // - no labels were moved to lower offsets than the (new) cur_off, so
941        //   the labels_at_this_branch list for any other branch need not change.
942        //
943        // labels_at_tail satisfies all invariants:
944        // - all labels that were at the tail after the truncated branch are
945        //   moved backward to just before the branch, which becomes the new tail;
946        //   thus every element in the list should remain (ensured by `.extend()`
947        //   above).
948        // - all labels that refer to the new tail, which is the start-offset of
949        //   the truncated branch, must be present. The `labels_at_this_branch`
950        //   list in the truncated branch's record is a complete and precise list
951        //   of exactly these labels; we append these to labels_at_tail.
952        // - labels_at_tail_off is at cur_off after truncation occurs, so the
953        //   list is valid (not to be lazily cleared).
954        //
955        // The stated operation was performed:
956        // - For each label at the end of the buffer prior to this method, it
957        //   now resolves to the new (truncated) end of the buffer: it must have
958        //   been in `labels_at_tail` (this list is precise and complete, and
959        //   the tail was at the end of the truncated branch on entry), and we
960        //   iterate over this list and set `label_offsets` to the new tail.
961        //   None of these labels could have been an alias (by invariant), so
962        //   `label_offsets` is authoritative for each.
963        // - No other labels will be past the end of the buffer, because of the
964        //   requirement that no labels be bound to the middle of branch ranges
965        //   (see comments to `add_{cond,uncond}_branch()`).
966        // - The buffer is truncated to just before the last branch, and the
967        //   fixup record referring to that last branch is removed.
968    }
969
970    /// Performs various optimizations on branches pointing at the current label.
971    pub fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) {
972        if ctrl_plane.get_decision() {
973            return;
974        }
975
976        self.lazily_clear_labels_at_tail();
977        // Invariants valid at this point.
978
979        trace!(
980            "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
981            self.latest_branches, self.labels_at_tail, self.pending_fixup_records
982        );
983
984        // We continue to munch on branches at the tail of the buffer until no
985        // more rules apply. Note that the loop only continues if a branch is
986        // actually truncated (or if labels are redirected away from a branch),
987        // so this always makes progress.
988        while let Some(b) = self.latest_branches.last() {
989            let cur_off = self.cur_offset();
990            trace!("optimize_branches: last branch {:?} at off {}", b, cur_off);
991            // If there has been any code emission since the end of the last branch or
992            // label definition, then there's nothing we can edit (because we
993            // don't move code once placed, only back up and overwrite), so
994            // clear the records and finish.
995            if b.end < cur_off {
996                break;
997            }
998
999            // If the "labels at this branch" list on this branch is
1000            // longer than a threshold, don't do any simplification,
1001            // and let the branch remain to separate those labels from
1002            // the current tail. This avoids quadratic behavior (see
1003            // #3468): otherwise, if a long string of "goto next;
1004            // next:" patterns are emitted, all of the labels will
1005            // coalesce into a long list of aliases for the current
1006            // buffer tail. We must track all aliases of the current
1007            // tail for correctness, but we are also allowed to skip
1008            // optimization (removal) of any branch, so we take the
1009            // escape hatch here and let it stand. In effect this
1010            // "spreads" the many thousands of labels in the
1011            // pathological case among an actual (harmless but
1012            // suboptimal) instruction once per N labels.
1013            if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD {
1014                break;
1015            }
1016
1017            // Invariant: we are looking at a branch that ends at the tail of
1018            // the buffer.
1019
1020            // For any branch, conditional or unconditional:
1021            // - If the target is a label at the current offset, then remove
1022            //   the conditional branch, and reset all labels that targeted
1023            //   the current offset (end of branch) to the truncated
1024            //   end-of-code.
1025            //
1026            // Preserves execution semantics: a branch to its own fallthrough
1027            // address is equivalent to a no-op; in both cases, nextPC is the
1028            // fallthrough.
1029            if self.resolve_label_offset(b.target) == cur_off {
1030                trace!("branch with target == cur off; truncating");
1031                self.truncate_last_branch();
1032                continue;
1033            }
1034
1035            // If latest is an unconditional branch:
1036            //
1037            // - If the branch's target is not its own start address, then for
1038            //   each label at the start of branch, make the label an alias of the
1039            //   branch target, and remove the label from the "labels at this
1040            //   branch" list.
1041            //
1042            //   - Preserves execution semantics: an unconditional branch's
1043            //     only effect is to set PC to a new PC; this change simply
1044            //     collapses one step in the step-semantics.
1045            //
1046            //   - Post-invariant: the labels that were bound to the start of
1047            //     this branch become aliases, so they must not be present in any
1048            //     labels-at-this-branch list or the labels-at-tail list. The
1049            //     labels are removed form the latest-branch record's
1050            //     labels-at-this-branch list, and are never placed in the
1051            //     labels-at-tail list. Furthermore, it is correct that they are
1052            //     not in either list, because they are now aliases, and labels
1053            //     that are aliases remain aliases forever.
1054            //
1055            // - If there is a prior unconditional branch that ends just before
1056            //   this one begins, and this branch has no labels bound to its
1057            //   start, then we can truncate this branch, because it is entirely
1058            //   unreachable (we have redirected all labels that make it
1059            //   reachable otherwise). Do so and continue around the loop.
1060            //
1061            //   - Preserves execution semantics: the branch is unreachable,
1062            //     because execution can only flow into an instruction from the
1063            //     prior instruction's fallthrough or from a branch bound to that
1064            //     instruction's start offset. Unconditional branches have no
1065            //     fallthrough, so if the prior instruction is an unconditional
1066            //     branch, no fallthrough entry can happen. The
1067            //     labels-at-this-branch list is complete (by invariant), so if it
1068            //     is empty, then the instruction is entirely unreachable. Thus,
1069            //     it can be removed.
1070            //
1071            //   - Post-invariant: ensured by truncate_last_branch().
1072            //
1073            // - If there is a prior conditional branch whose target label
1074            //   resolves to the current offset (branches around the
1075            //   unconditional branch), then remove the unconditional branch,
1076            //   and make the target of the unconditional the target of the
1077            //   conditional instead.
1078            //
1079            //   - Preserves execution semantics: previously we had:
1080            //
1081            //         L1:
1082            //            cond_br L2
1083            //            br L3
1084            //         L2:
1085            //            (end of buffer)
1086            //
1087            //     by removing the last branch, we have:
1088            //
1089            //         L1:
1090            //            cond_br L2
1091            //         L2:
1092            //            (end of buffer)
1093            //
1094            //     we then fix up the records for the conditional branch to
1095            //     have:
1096            //
1097            //         L1:
1098            //           cond_br.inverted L3
1099            //         L2:
1100            //
1101            //     In the original code, control flow reaches L2 when the
1102            //     conditional branch's predicate is true, and L3 otherwise. In
1103            //     the optimized code, the same is true.
1104            //
1105            //   - Post-invariant: all edits to latest_branches and
1106            //     labels_at_tail are performed by `truncate_last_branch()`,
1107            //     which maintains the invariants at each step.
1108
1109            if b.is_uncond() {
1110                // Set any label equal to current branch's start as an alias of
1111                // the branch's target, if the target is not the branch itself
1112                // (i.e., an infinite loop).
1113                //
1114                // We cannot perform this aliasing if the target of this branch
1115                // ultimately aliases back here; if so, we need to keep this
1116                // branch, so break out of this loop entirely (and clear the
1117                // latest-branches list below).
1118                //
1119                // Note that this check is what prevents cycles from forming in
1120                // `self.label_aliases`. To see why, consider an arbitrary start
1121                // state:
1122                //
1123                // label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to
1124                // Ln, which is not aliased.
1125                //
1126                // We would create a cycle if we assigned label_aliases[Ln]
1127                // = L1.  Note that the below assignment is the only write
1128                // to label_aliases.
1129                //
1130                // By our other invariants, we have that Ln (`l` below)
1131                // resolves to the offset `b.start`, because it is in the
1132                // set `b.labels_at_this_branch`.
1133                //
1134                // If L1 were already aliased, through some arbitrarily deep
1135                // chain, to Ln, then it must also resolve to this offset
1136                // `b.start`.
1137                //
1138                // By checking the resolution of `L1` against this offset,
1139                // and aborting this branch-simplification if they are
1140                // equal, we prevent the below assignment from ever creating
1141                // a cycle.
1142                if self.resolve_label_offset(b.target) != b.start {
1143                    let redirected = b.labels_at_this_branch.len();
1144                    for &l in &b.labels_at_this_branch {
1145                        trace!(
1146                            " -> label at start of branch {:?} redirected to target {:?}",
1147                            l, b.target
1148                        );
1149                        self.label_aliases[l.0 as usize] = b.target;
1150                        // NOTE: we continue to ensure the invariant that labels
1151                        // pointing to tail of buffer are in `labels_at_tail`
1152                        // because we already ensured above that the last branch
1153                        // cannot have a target of `cur_off`; so we never have
1154                        // to put the label into `labels_at_tail` when moving it
1155                        // here.
1156                    }
1157                    // Maintain invariant: all branches have been redirected
1158                    // and are no longer pointing at the start of this branch.
1159                    let mut_b = self.latest_branches.last_mut().unwrap();
1160                    mut_b.labels_at_this_branch.clear();
1161
1162                    if redirected > 0 {
1163                        trace!(" -> after label redirects, restarting loop");
1164                        continue;
1165                    }
1166                } else {
1167                    break;
1168                }
1169
1170                let b = self.latest_branches.last().unwrap();
1171
1172                // Examine any immediately preceding branch.
1173                if self.latest_branches.len() > 1 {
1174                    let prev_b = &self.latest_branches[self.latest_branches.len() - 2];
1175                    trace!(" -> more than one branch; prev_b = {:?}", prev_b);
1176                    // This uncond is immediately after another uncond; we
1177                    // should have already redirected labels to this uncond away
1178                    // (but check to be sure); so we can truncate this uncond.
1179                    if prev_b.is_uncond()
1180                        && prev_b.end == b.start
1181                        && b.labels_at_this_branch.is_empty()
1182                    {
1183                        trace!(" -> uncond follows another uncond; truncating");
1184                        self.truncate_last_branch();
1185                        continue;
1186                    }
1187
1188                    // This uncond is immediately after a conditional, and the
1189                    // conditional's target is the end of this uncond, and we've
1190                    // already redirected labels to this uncond away; so we can
1191                    // truncate this uncond, flip the sense of the conditional, and
1192                    // set the conditional's target (in `latest_branches` and in
1193                    // `fixup_records`) to the uncond's target.
1194                    if prev_b.is_cond()
1195                        && prev_b.end == b.start
1196                        && self.resolve_label_offset(prev_b.target) == cur_off
1197                    {
1198                        trace!(
1199                            " -> uncond follows a conditional, and conditional's target resolves to current offset"
1200                        );
1201                        // Save the target of the uncond (this becomes the
1202                        // target of the cond), and truncate the uncond.
1203                        let target = b.target;
1204                        let data = prev_b.inverted.clone().unwrap();
1205                        self.truncate_last_branch();
1206
1207                        // Mutate the code and cond branch.
1208                        let off_before_edit = self.cur_offset();
1209                        let prev_b = self.latest_branches.last_mut().unwrap();
1210                        let not_inverted = SmallVec::from(
1211                            &self.data[(prev_b.start as usize)..(prev_b.end as usize)],
1212                        );
1213
1214                        // Low-level edit: replaces bytes of branch with
1215                        // inverted form. cur_off remains the same afterward, so
1216                        // we do not need to modify label data structures.
1217                        self.data.truncate(prev_b.start as usize);
1218                        self.data.extend_from_slice(&data[..]);
1219
1220                        // Save the original code as the inversion of the
1221                        // inverted branch, in case we later edit this branch
1222                        // again.
1223                        prev_b.inverted = Some(not_inverted);
1224                        self.pending_fixup_records[prev_b.fixup].label = target;
1225                        trace!(" -> reassigning target of condbr to {:?}", target);
1226                        prev_b.target = target;
1227                        debug_assert_eq!(off_before_edit, self.cur_offset());
1228                        continue;
1229                    }
1230                }
1231            }
1232
1233            // If we couldn't do anything with the last branch, then break.
1234            break;
1235        }
1236
1237        self.purge_latest_branches();
1238
1239        trace!(
1240            "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
1241            self.latest_branches, self.labels_at_tail, self.pending_fixup_records
1242        );
1243    }
1244
1245    fn purge_latest_branches(&mut self) {
1246        // All of our branch simplification rules work only if a branch ends at
1247        // the tail of the buffer, with no following code; and branches are in
1248        // order in latest_branches; so if the last entry ends prior to
1249        // cur_offset, then clear all entries.
1250        let cur_off = self.cur_offset();
1251        if let Some(l) = self.latest_branches.last() {
1252            if l.end < cur_off {
1253                trace!("purge_latest_branches: removing branch {:?}", l);
1254                self.latest_branches.clear();
1255            }
1256        }
1257
1258        // Post-invariant: no invariant requires any branch to appear in
1259        // `latest_branches`; it is always optional. The list-clear above thus
1260        // preserves all semantics.
1261    }
1262
1263    /// Emit a trap at some point in the future with the specified code and
1264    /// stack map.
1265    ///
1266    /// This function returns a [`MachLabel`] which will be the future address
1267    /// of the trap. Jumps should refer to this label, likely by using the
1268    /// [`MachBuffer::use_label_at_offset`] method, to get a relocation
1269    /// patched in once the address of the trap is known.
1270    ///
1271    /// This will batch all traps into the end of the function.
1272    pub fn defer_trap(&mut self, code: TrapCode) -> MachLabel {
1273        let label = self.get_label();
1274        self.pending_traps.push(MachLabelTrap {
1275            label,
1276            code,
1277            loc: self.cur_srcloc.map(|(_start, loc)| loc),
1278        });
1279        label
1280    }
1281
1282    /// Is an island needed within the next N bytes?
1283    pub fn island_needed(&self, distance: CodeOffset) -> bool {
1284        let deadline = match self.fixup_records.peek() {
1285            Some(fixup) => fixup.deadline().min(self.pending_fixup_deadline),
1286            None => self.pending_fixup_deadline,
1287        };
1288        trace!(
1289            "checking island_needed: cur_offset = {} deadline = {} worst_case_end_of_island = {}",
1290            self.cur_offset(),
1291            deadline,
1292            self.worst_case_end_of_island(distance)
1293        );
1294        let needed = deadline < u32::MAX && self.worst_case_end_of_island(distance) > deadline;
1295        trace!(" -> needed = {needed}");
1296        needed
1297    }
1298
1299    /// Returns the maximal offset that islands can reach if `distance` more
1300    /// bytes are appended.
1301    ///
1302    /// This is used to determine if veneers need insertions since jumps that
1303    /// can't reach past this point must get a veneer of some form.
1304    fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset {
1305        // Assume that all fixups will require veneers and that the veneers are
1306        // the worst-case size for each platform. This is an over-generalization
1307        // to avoid iterating over the `fixup_records` list or maintaining
1308        // information about it as we go along.
1309        let island_worst_case_size = ((self.fixup_records.len() + self.pending_fixup_records.len())
1310            as u32)
1311            * (I::LabelUse::worst_case_veneer_size())
1312            + self.pending_constants_size
1313            + (self.pending_traps.len() * I::TRAP_OPCODE.len()) as u32;
1314        self.cur_offset()
1315            .saturating_add(distance)
1316            .saturating_add(island_worst_case_size)
1317    }
1318
1319    /// Emit all pending constants and required pending veneers.
1320    ///
1321    /// Should only be called if `island_needed()` returns true, i.e., if we
1322    /// actually reach a deadline. It's not necessarily a problem to do so
1323    /// otherwise but it may result in unnecessary work during emission.
1324    pub fn emit_island(&mut self, distance: CodeOffset, ctrl_plane: &mut ControlPlane) {
1325        self.emit_island_maybe_forced(ForceVeneers::No, distance, ctrl_plane);
1326    }
1327
1328    /// Same as `emit_island`, but an internal API with a `force_veneers`
1329    /// argument to force all veneers to always get emitted for debugging.
1330    fn emit_island_maybe_forced(
1331        &mut self,
1332        force_veneers: ForceVeneers,
1333        distance: CodeOffset,
1334        ctrl_plane: &mut ControlPlane,
1335    ) {
1336        trace!(
1337            "emitting island at {}, distance = {distance}",
1338            self.cur_offset()
1339        );
1340
1341        // We're going to purge fixups, so no latest-branch editing can happen
1342        // anymore.
1343        self.latest_branches.clear();
1344
1345        // End the current location tracking since anything emitted during this
1346        // function shouldn't be attributed to whatever the current source
1347        // location is.
1348        //
1349        // Note that the current source location, if it's set right now, will be
1350        // restored at the end of this island emission.
1351        let cur_loc = self.cur_srcloc.map(|(_, loc)| loc);
1352        if cur_loc.is_some() {
1353            self.end_srcloc();
1354        }
1355
1356        let forced_threshold = self.worst_case_end_of_island(distance);
1357        trace!("forced_threshold = {forced_threshold}");
1358
1359        // Emit traps/constants after the island: with potentially
1360        // unbounded pending constants/traps and potentially small
1361        // deadlines, it would otherwise be possible to emit a
1362        // small-range jump, have a nearby deadline *before* the end
1363        // of pending constants/traps, and not be able to emit a
1364        // veneer in time.
1365        //
1366        // Fixups whose labels aren't yet defined (e.g. references to
1367        // pending constants/traps) are simply deferred here; they'll
1368        // be resolved in the next island or in the final fixup pass
1369        // at the end of emission.
1370
1371        // Either handle all pending fixups because they're ready or move them
1372        // onto the `BinaryHeap` tracking all pending fixups if they aren't
1373        // ready.
1374        assert!(self.latest_branches.is_empty());
1375        trace!(
1376            "About to handle fixups at offset {}: {:?}",
1377            self.cur_offset(),
1378            self.pending_fixup_records
1379        );
1380        for fixup in mem::take(&mut self.pending_fixup_records) {
1381            if self.should_apply_fixup(&fixup, forced_threshold) {
1382                self.handle_fixup(fixup, force_veneers, forced_threshold);
1383            } else {
1384                self.fixup_records.push(fixup);
1385            }
1386        }
1387        self.pending_fixup_deadline = u32::MAX;
1388        while let Some(fixup) = self.fixup_records.peek() {
1389            trace!(
1390                "emit_island: fixup {:?} deadline {}",
1391                fixup,
1392                fixup.deadline()
1393            );
1394
1395            // If this fixup shouldn't be applied, that means its label isn't
1396            // defined yet and there'll be remaining space to apply a veneer if
1397            // necessary in the future after this island. In that situation
1398            // because `fixup_records` is sorted by deadline this loop can
1399            // exit.
1400            if !self.should_apply_fixup(fixup, forced_threshold) {
1401                break;
1402            }
1403
1404            let fixup = self.fixup_records.pop().unwrap();
1405            self.handle_fixup(fixup, force_veneers, forced_threshold);
1406        }
1407
1408        // Now emit pending traps and constants.
1409        //
1410        // Note that traps are placed first since this typically happens at the
1411        // end of the function and for disassemblers we try to keep all the code
1412        // contiguously together.
1413        trace!("emitting pending traps: {:?}", self.pending_traps);
1414        for MachLabelTrap { label, code, loc } in mem::take(&mut self.pending_traps) {
1415            // If this trap has source information associated with it then
1416            // emit this information for the trap instruction going out now too.
1417            if let Some(loc) = loc {
1418                self.start_srcloc(loc);
1419            }
1420            self.align_to(I::LabelUse::ALIGN);
1421            self.bind_label(label, ctrl_plane);
1422            self.add_trap(code);
1423            self.put_data(I::TRAP_OPCODE);
1424            if loc.is_some() {
1425                self.end_srcloc();
1426            }
1427        }
1428
1429        trace!("emitting pending constants: {:?}", self.pending_constants);
1430        for constant in mem::take(&mut self.pending_constants) {
1431            let MachBufferConstant { align, size, .. } = self.constants[constant];
1432            let label = self.constants[constant].upcoming_label.take().unwrap();
1433            self.align_to(align);
1434            self.bind_label(label, ctrl_plane);
1435            self.used_constants.push((constant, self.cur_offset()));
1436            self.get_appended_space(size);
1437        }
1438
1439        if let Some(loc) = cur_loc {
1440            self.start_srcloc(loc);
1441        }
1442    }
1443
1444    fn should_apply_fixup(&self, fixup: &MachLabelFixup<I>, forced_threshold: CodeOffset) -> bool {
1445        let label_offset = self.resolve_label_offset(fixup.label);
1446        trace!(
1447            "should_apply_fixup: fixup {fixup:?} label_offset {label_offset} deadline {} forced_threshold {forced_threshold} supports_veneer {}",
1448            fixup.deadline(),
1449            fixup.kind.supports_veneer()
1450        );
1451        let result = (label_offset != UNKNOWN_LABEL_OFFSET)
1452            || ((fixup.deadline() < forced_threshold) && fixup.kind.supports_veneer());
1453        trace!(
1454            " -> {}, {}, {} -> {result}",
1455            label_offset != UNKNOWN_LABEL_OFFSET,
1456            fixup.deadline() < forced_threshold,
1457            fixup.kind.supports_veneer()
1458        );
1459        result
1460    }
1461
1462    fn handle_fixup(
1463        &mut self,
1464        fixup: MachLabelFixup<I>,
1465        force_veneers: ForceVeneers,
1466        forced_threshold: CodeOffset,
1467    ) {
1468        let MachLabelFixup {
1469            label,
1470            offset,
1471            kind,
1472        } = fixup;
1473        let start = offset as usize;
1474        let end = (offset + kind.patch_size()) as usize;
1475        let label_offset = self.resolve_label_offset(label);
1476
1477        if label_offset != UNKNOWN_LABEL_OFFSET {
1478            // If the offset of the label for this fixup is known then
1479            // we're going to do something here-and-now. We're either going
1480            // to patch the original offset because it's an in-bounds jump,
1481            // or we're going to generate a veneer, patch the fixup to jump
1482            // to the veneer, and then keep going.
1483            //
1484            // If the label comes after the original fixup, then we should
1485            // be guaranteed that the jump is in-bounds. Otherwise there's
1486            // a bug somewhere because this method wasn't called soon
1487            // enough. All forward-jumps are tracked and should get veneers
1488            // before their deadline comes and they're unable to jump
1489            // further.
1490            //
1491            // Otherwise if the label is before the fixup, then that's a
1492            // backwards jump. If it's past the maximum negative range
1493            // then we'll emit a veneer that to jump forward to which can
1494            // then jump backwards.
1495            let veneer_required = if label_offset >= offset {
1496                assert!((label_offset - offset) <= kind.max_pos_range());
1497                false
1498            } else {
1499                (offset - label_offset) > kind.max_neg_range()
1500            };
1501            trace!(
1502                " -> label_offset = {}, known, required = {} (pos {} neg {})",
1503                label_offset,
1504                veneer_required,
1505                kind.max_pos_range(),
1506                kind.max_neg_range()
1507            );
1508
1509            if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required {
1510                self.emit_veneer(label, offset, kind);
1511            } else {
1512                let slice = &mut self.data[start..end];
1513                trace!(
1514                    "patching in-range! slice = {slice:?}; offset = {offset:#x}; label_offset = {label_offset:#x}"
1515                );
1516                kind.patch(slice, offset, label_offset);
1517            }
1518        } else {
1519            // If the offset of this label is not known at this time then
1520            // that means that a veneer is required because after this
1521            // island the target can't be in range of the original target.
1522            assert!(forced_threshold - offset > kind.max_pos_range());
1523            self.emit_veneer(label, offset, kind);
1524        }
1525    }
1526
1527    /// Emits a "veneer" the `kind` code at `offset` to jump to `label`.
1528    ///
1529    /// This will generate extra machine code, using `kind`, to get a
1530    /// larger-jump-kind than `kind` allows. The code at `offset` is then
1531    /// patched to jump to our new code, and then the new code is enqueued for
1532    /// a fixup to get processed at some later time.
1533    fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) {
1534        // If this `kind` doesn't support a veneer then that's a bug in the
1535        // backend because we need to implement support for such a veneer.
1536        assert!(
1537            kind.supports_veneer(),
1538            "jump beyond the range of {kind:?} but a veneer isn't supported",
1539        );
1540
1541        // Allocate space for a veneer in the island.
1542        self.align_to(I::LabelUse::ALIGN);
1543        let veneer_offset = self.cur_offset();
1544        trace!("making a veneer at {}", veneer_offset);
1545        let start = offset as usize;
1546        let end = (offset + kind.patch_size()) as usize;
1547        let slice = &mut self.data[start..end];
1548        // Patch the original label use to refer to the veneer.
1549        trace!(
1550            "patching original at offset {} to veneer offset {}",
1551            offset, veneer_offset
1552        );
1553        kind.patch(slice, offset, veneer_offset);
1554        // Generate the veneer.
1555        let veneer_slice = self.get_appended_space(kind.veneer_size() as usize);
1556        let (veneer_fixup_off, veneer_label_use) =
1557            kind.generate_veneer(veneer_slice, veneer_offset);
1558        trace!(
1559            "generated veneer; fixup offset {}, label_use {:?}",
1560            veneer_fixup_off, veneer_label_use
1561        );
1562        // Register a new use of `label` with our new veneer fixup and
1563        // offset. This'll recalculate deadlines accordingly and
1564        // enqueue this fixup to get processed at some later
1565        // time.
1566        self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use);
1567    }
1568
1569    fn finish_emission_maybe_forcing_veneers(
1570        &mut self,
1571        force_veneers: ForceVeneers,
1572        ctrl_plane: &mut ControlPlane,
1573    ) {
1574        while !self.pending_constants.is_empty()
1575            || !self.pending_traps.is_empty()
1576            || !self.fixup_records.is_empty()
1577            || !self.pending_fixup_records.is_empty()
1578        {
1579            // `emit_island()` will emit any pending veneers and constants, and
1580            // as a side-effect, will also take care of any fixups with resolved
1581            // labels eagerly.
1582            self.emit_island_maybe_forced(force_veneers, 0, ctrl_plane);
1583        }
1584
1585        // Ensure that all labels have been fixed up after the last island is emitted. This is a
1586        // full (release-mode) assert because an unresolved label means the emitted code is
1587        // incorrect.
1588        assert!(self.fixup_records.is_empty());
1589        assert!(self.pending_fixup_records.is_empty());
1590    }
1591
1592    /// Finish any deferred emissions and/or fixups.
1593    pub fn finish(
1594        mut self,
1595        constants: &VCodeConstants,
1596        ctrl_plane: &mut ControlPlane,
1597    ) -> MachBufferFinalized<Stencil> {
1598        let _tt = timing::vcode_emit_finish();
1599
1600        self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane);
1601
1602        let alignment = self.finish_constants(constants);
1603
1604        // Resolve all labels to their offsets.
1605        let finalized_relocs = self
1606            .relocs
1607            .iter()
1608            .map(|reloc| FinalizedMachReloc {
1609                offset: reloc.offset,
1610                kind: reloc.kind,
1611                addend: reloc.addend,
1612                target: match &reloc.target {
1613                    RelocTarget::ExternalName(name) => {
1614                        FinalizedRelocTarget::ExternalName(name.clone())
1615                    }
1616                    RelocTarget::Label(label) => {
1617                        FinalizedRelocTarget::Func(self.resolve_label_offset(*label))
1618                    }
1619                },
1620            })
1621            .collect();
1622
1623        let finalized_exception_handlers = self
1624            .exception_handlers
1625            .iter()
1626            .map(|handler| handler.finalize(|label| self.resolve_label_offset(label)))
1627            .collect();
1628
1629        let mut srclocs = self.srclocs;
1630        srclocs.sort_by_key(|entry| entry.start);
1631
1632        MachBufferFinalized {
1633            data: self.data,
1634            relocs: finalized_relocs,
1635            traps: self.traps,
1636            call_sites: self.call_sites,
1637            patchable_call_sites: self.patchable_call_sites,
1638            exception_handlers: finalized_exception_handlers,
1639            srclocs,
1640            debug_tags: self.debug_tags,
1641            debug_tag_pool: self.debug_tag_pool,
1642            user_stack_maps: self.user_stack_maps,
1643            unwind_info: self.unwind_info,
1644            alignment,
1645            frame_layout: self.frame_layout,
1646            nop_units: I::gen_nop_units(),
1647        }
1648    }
1649
1650    /// Add an external relocation at the given offset.
1651    pub fn add_reloc_at_offset<T: Into<RelocTarget> + Clone>(
1652        &mut self,
1653        offset: CodeOffset,
1654        kind: Reloc,
1655        target: &T,
1656        addend: Addend,
1657    ) {
1658        let target: RelocTarget = target.clone().into();
1659        // FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally
1660        // generate a label-use statement to track whether an island is possibly
1661        // needed to escape this function to actually get to the external name.
1662        // This is most likely to come up on AArch64 where calls between
1663        // functions use a 26-bit signed offset which gives +/- 64MB. This means
1664        // that if a function is 128MB in size and there's a call in the middle
1665        // it's impossible to reach the actual target. Also, while it's
1666        // technically possible to jump to the start of a function and then jump
1667        // further, island insertion below always inserts islands after
1668        // previously appended code so for Cranelift's own implementation this
1669        // is also a problem for 64MB functions on AArch64 which start with a
1670        // call instruction, those won't be able to escape.
1671        //
1672        // Ideally what needs to happen here is that a `LabelUse` is
1673        // transparently generated (or call-sites of this function are audited
1674        // to generate a `LabelUse` instead) and tracked internally. The actual
1675        // relocation would then change over time if and when a veneer is
1676        // inserted, where the relocation here would be patched by this
1677        // `MachBuffer` to jump to the veneer. The problem, though, is that all
1678        // this still needs to end up, in the case of a singular function,
1679        // generating a final relocation pointing either to this particular
1680        // relocation or to the veneer inserted. Additionally
1681        // `MachBuffer` needs the concept of a label which will never be
1682        // resolved, so `emit_island` doesn't trip over not actually ever
1683        // knowing what some labels are. Currently the loop in
1684        // `finish_emission_maybe_forcing_veneers` would otherwise infinitely
1685        // loop.
1686        //
1687        // For now this means that because relocs aren't tracked at all that
1688        // AArch64 functions have a rough size limits of 64MB. For now that's
1689        // somewhat reasonable and the failure mode is a panic in `MachBuffer`
1690        // when a relocation can't otherwise be resolved later, so it shouldn't
1691        // actually result in any memory unsafety or anything like that.
1692        self.relocs.push(MachReloc {
1693            offset,
1694            kind,
1695            target,
1696            addend,
1697        });
1698    }
1699
1700    /// Add an external relocation at the current offset.
1701    pub fn add_reloc<T: Into<RelocTarget> + Clone>(
1702        &mut self,
1703        kind: Reloc,
1704        target: &T,
1705        addend: Addend,
1706    ) {
1707        self.add_reloc_at_offset(self.data.len() as CodeOffset, kind, target, addend);
1708    }
1709
1710    /// Add a trap record at the current offset.
1711    pub fn add_trap(&mut self, code: TrapCode) {
1712        self.traps.push(MachTrap {
1713            offset: self.data.len() as CodeOffset,
1714            code,
1715        });
1716    }
1717
1718    /// Add a call-site record at the current offset.
1719    pub fn add_call_site(&mut self) {
1720        self.add_try_call_site(None, core::iter::empty());
1721    }
1722
1723    /// Add a call-site record at the current offset with exception
1724    /// handlers.
1725    pub fn add_try_call_site(
1726        &mut self,
1727        frame_offset: Option<u32>,
1728        exception_handlers: impl Iterator<Item = MachExceptionHandler>,
1729    ) {
1730        let start = u32::try_from(self.exception_handlers.len()).unwrap();
1731        self.exception_handlers.extend(exception_handlers);
1732        let end = u32::try_from(self.exception_handlers.len()).unwrap();
1733        let exception_handler_range = start..end;
1734
1735        self.call_sites.push(MachCallSite {
1736            ret_addr: self.data.len() as CodeOffset,
1737            frame_offset,
1738            exception_handler_range,
1739        });
1740    }
1741
1742    /// Add a patchable call record at the current offset The actual
1743    /// call is expected to have been emitted; the VCodeInst trait
1744    /// specifies how to NOP it out, and we carry that information to
1745    /// the finalized Machbuffer.
1746    pub fn add_patchable_call_site(&mut self, len: u32) {
1747        self.patchable_call_sites.push(MachPatchableCallSite {
1748            ret_addr: self.cur_offset(),
1749            len,
1750        });
1751    }
1752
1753    /// Add an unwind record at the current offset.
1754    pub fn add_unwind(&mut self, unwind: UnwindInst) {
1755        self.unwind_info.push((self.cur_offset(), unwind));
1756    }
1757
1758    /// Set the `SourceLoc` for code from this offset until the offset at the
1759    /// next call to `end_srcloc()`.
1760    /// Returns the current [CodeOffset] and [RelSourceLoc].
1761    pub fn start_srcloc(&mut self, loc: RelSourceLoc) -> (CodeOffset, RelSourceLoc) {
1762        let cur = (self.cur_offset(), loc);
1763        self.cur_srcloc = Some(cur);
1764        cur
1765    }
1766
1767    /// Mark the end of the `SourceLoc` segment started at the last
1768    /// `start_srcloc()` call.
1769    pub fn end_srcloc(&mut self) {
1770        let (start, loc) = self
1771            .cur_srcloc
1772            .take()
1773            .expect("end_srcloc() called without start_srcloc()");
1774        let end = self.cur_offset();
1775        // Skip zero-length extends.
1776        debug_assert!(end >= start);
1777        if end > start {
1778            self.srclocs.push(MachSrcLoc { start, end, loc });
1779        }
1780    }
1781
1782    /// Push a user stack map onto this buffer.
1783    ///
1784    /// The stack map is associated with the given `return_addr` code
1785    /// offset. This must be the PC for the instruction just *after* this stack
1786    /// map's associated instruction. For example in the sequence `call $foo;
1787    /// add r8, rax`, the `return_addr` must be the offset of the start of the
1788    /// `add` instruction.
1789    ///
1790    /// Stack maps must be pushed in sorted `return_addr` order.
1791    pub fn push_user_stack_map(
1792        &mut self,
1793        emit_state: &I::State,
1794        return_addr: CodeOffset,
1795        mut stack_map: ir::UserStackMap,
1796    ) {
1797        let span = emit_state.frame_layout().active_size();
1798        trace!("Adding user stack map @ {return_addr:#x} spanning {span} bytes: {stack_map:?}");
1799
1800        debug_assert!(
1801            self.user_stack_maps
1802                .last()
1803                .map_or(true, |(prev_addr, _, _)| *prev_addr < return_addr),
1804            "pushed stack maps out of order: {} is not less than {}",
1805            self.user_stack_maps.last().unwrap().0,
1806            return_addr,
1807        );
1808
1809        stack_map.finalize(emit_state.frame_layout().sp_to_sized_stack_slots());
1810        self.user_stack_maps.push((return_addr, span, stack_map));
1811    }
1812
1813    /// Push a debug tag associated with the current buffer offset.
1814    pub fn push_debug_tags(&mut self, pos: MachDebugTagPos, tags: &[DebugTag]) {
1815        trace!("debug tags at offset {}: {tags:?}", self.cur_offset());
1816        let start = u32::try_from(self.debug_tag_pool.len()).unwrap();
1817        self.debug_tag_pool.extend(tags.iter().cloned());
1818        let end = u32::try_from(self.debug_tag_pool.len()).unwrap();
1819        self.debug_tags.push(MachDebugTags {
1820            offset: self.cur_offset(),
1821            pos,
1822            range: start..end,
1823        });
1824    }
1825
1826    /// Increase the alignment of the buffer to the given alignment if bigger
1827    /// than the current alignment.
1828    pub fn set_log2_min_function_alignment(&mut self, align_to: u8) {
1829        self.min_alignment = self.min_alignment.max(
1830            1u32.checked_shl(u32::from(align_to))
1831                .expect("log2_min_function_alignment too large"),
1832        );
1833    }
1834
1835    /// Set the frame layout metadata.
1836    pub fn set_frame_layout(&mut self, frame_layout: MachBufferFrameLayout) {
1837        debug_assert!(self.frame_layout.is_none());
1838        self.frame_layout = Some(frame_layout);
1839    }
1840}
1841
1842impl<I: VCodeInst> Extend<u8> for MachBuffer<I> {
1843    fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) {
1844        for b in iter {
1845            self.put1(b);
1846        }
1847    }
1848}
1849
1850impl<T: CompilePhase> MachBufferFinalized<T> {
1851    /// Get a list of source location mapping tuples in sorted-by-start-offset order.
1852    pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] {
1853        &self.srclocs[..]
1854    }
1855
1856    /// Get all debug tags, sorted by associated offset.
1857    pub fn debug_tags(&self) -> impl Iterator<Item = MachBufferDebugTagList<'_>> {
1858        self.debug_tags.iter().map(|tags| {
1859            let start = usize::try_from(tags.range.start).unwrap();
1860            let end = usize::try_from(tags.range.end).unwrap();
1861            MachBufferDebugTagList {
1862                offset: tags.offset,
1863                pos: tags.pos,
1864                tags: &self.debug_tag_pool[start..end],
1865            }
1866        })
1867    }
1868
1869    /// Get the total required size for the code.
1870    pub fn total_size(&self) -> CodeOffset {
1871        self.data.len() as CodeOffset
1872    }
1873
1874    /// Return the code in this mach buffer as a hex string for testing purposes.
1875    pub fn stringify_code_bytes(&self) -> String {
1876        // This is pretty lame, but whatever ..
1877        use core::fmt::Write;
1878        let mut s = String::with_capacity(self.data.len() * 2);
1879        for b in &self.data {
1880            write!(&mut s, "{b:02X}").unwrap();
1881        }
1882        s
1883    }
1884
1885    /// Get the code bytes.
1886    pub fn data(&self) -> &[u8] {
1887        // N.B.: we emit every section into the .text section as far as
1888        // the `CodeSink` is concerned; we do not bother to segregate
1889        // the contents into the actual program text, the jumptable and the
1890        // rodata (constant pool). This allows us to generate code assuming
1891        // that these will not be relocated relative to each other, and avoids
1892        // having to designate each section as belonging in one of the three
1893        // fixed categories defined by `CodeSink`. If this becomes a problem
1894        // later (e.g. because of memory permissions or similar), we can
1895        // add this designation and segregate the output; take care, however,
1896        // to add the appropriate relocations in this case.
1897
1898        &self.data[..]
1899    }
1900
1901    /// Get a mutable slice of the code bytes, allowing patching
1902    /// post-passes.
1903    pub fn data_mut(&mut self) -> &mut [u8] {
1904        &mut self.data[..]
1905    }
1906
1907    /// Get the list of external relocations for this code.
1908    pub fn relocs(&self) -> &[FinalizedMachReloc] {
1909        &self.relocs[..]
1910    }
1911
1912    /// Get the list of trap records for this code.
1913    pub fn traps(&self) -> &[MachTrap] {
1914        &self.traps[..]
1915    }
1916
1917    /// Get the user stack map metadata for this code.
1918    pub fn user_stack_maps(&self) -> &[(CodeOffset, u32, ir::UserStackMap)] {
1919        &self.user_stack_maps
1920    }
1921
1922    /// Take this buffer's user stack map metadata.
1923    pub fn take_user_stack_maps(&mut self) -> SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]> {
1924        mem::take(&mut self.user_stack_maps)
1925    }
1926
1927    /// Get the list of call sites for this code, along with
1928    /// associated exception handlers.
1929    ///
1930    /// Each item yielded by the returned iterator is a struct with:
1931    ///
1932    /// - The call site metadata record, with a `ret_addr` field
1933    ///   directly accessible and denoting the offset of the return
1934    ///   address into this buffer's code.
1935    /// - The slice of pairs of exception tags and code offsets
1936    ///   denoting exception-handler entry points associated with this
1937    ///   call site.
1938    pub fn call_sites(&self) -> impl Iterator<Item = FinalizedMachCallSite<'_>> + '_ {
1939        self.call_sites.iter().map(|call_site| {
1940            let handler_range = call_site.exception_handler_range.clone();
1941            let handler_range = usize::try_from(handler_range.start).unwrap()
1942                ..usize::try_from(handler_range.end).unwrap();
1943            FinalizedMachCallSite {
1944                ret_addr: call_site.ret_addr,
1945                frame_offset: call_site.frame_offset,
1946                exception_handlers: &self.exception_handlers[handler_range],
1947            }
1948        })
1949    }
1950
1951    /// Get the frame layout, if known.
1952    pub fn frame_layout(&self) -> Option<&MachBufferFrameLayout> {
1953        self.frame_layout.as_ref()
1954    }
1955
1956    /// Get the list of patchable call sites for this code.
1957    ///
1958    /// Each location in the buffer contains the bytes for a call
1959    /// instruction to the specified target. If the call is to be
1960    /// patched out, the bytes in the region should be replaced with
1961    /// those given in the `MachBufferFinalized::nop` array, repeated
1962    /// as many times as necessary. (The length of the patchable
1963    /// region is guaranteed to be an integer multiple of that NOP
1964    /// unit size.)
1965    pub fn patchable_call_sites(&self) -> impl Iterator<Item = &MachPatchableCallSite> + '_ {
1966        self.patchable_call_sites.iter()
1967    }
1968}
1969
1970/// An item in the exception-handler list for a callsite, with label
1971/// references.  Items are interpreted in left-to-right order and the
1972/// first match wins.
1973#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1974pub enum MachExceptionHandler {
1975    /// A specific tag (in the current dynamic context) should be
1976    /// handled by the code at the given offset.
1977    Tag(ExceptionTag, MachLabel),
1978    /// All exceptions should be handled by the code at the given
1979    /// offset.
1980    Default(MachLabel),
1981    /// The dynamic context for interpreting tags is updated to the
1982    /// value stored in the given machine location (in this frame's
1983    /// context).
1984    Context(ExceptionContextLoc),
1985}
1986
1987impl MachExceptionHandler {
1988    fn finalize<F: Fn(MachLabel) -> CodeOffset>(self, f: F) -> FinalizedMachExceptionHandler {
1989        match self {
1990            Self::Tag(tag, label) => FinalizedMachExceptionHandler::Tag(tag, f(label)),
1991            Self::Default(label) => FinalizedMachExceptionHandler::Default(f(label)),
1992            Self::Context(loc) => FinalizedMachExceptionHandler::Context(loc),
1993        }
1994    }
1995}
1996
1997/// An item in the exception-handler list for a callsite, with final
1998/// (lowered) code offsets. Items are interpreted in left-to-right
1999/// order and the first match wins.
2000#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2001#[cfg_attr(
2002    feature = "enable-serde",
2003    derive(serde_derive::Serialize, serde_derive::Deserialize)
2004)]
2005pub enum FinalizedMachExceptionHandler {
2006    /// A specific tag (in the current dynamic context) should be
2007    /// handled by the code at the given offset.
2008    Tag(ExceptionTag, CodeOffset),
2009    /// All exceptions should be handled by the code at the given
2010    /// offset.
2011    Default(CodeOffset),
2012    /// The dynamic context for interpreting tags is updated to the
2013    /// value stored in the given machine location (in this frame's
2014    /// context).
2015    Context(ExceptionContextLoc),
2016}
2017
2018/// A location for a dynamic exception context value.
2019#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2020#[cfg_attr(
2021    feature = "enable-serde",
2022    derive(serde_derive::Serialize, serde_derive::Deserialize)
2023)]
2024pub enum ExceptionContextLoc {
2025    /// An offset from SP at the callsite.
2026    SPOffset(u32),
2027    /// A GPR at the callsite. The physical register number for the
2028    /// GPR register file on the target architecture is used.
2029    GPR(u8),
2030}
2031
2032/// Metadata about a constant.
2033struct MachBufferConstant {
2034    /// A label which has not yet been bound which can be used for this
2035    /// constant.
2036    ///
2037    /// This is lazily created when a label is requested for a constant and is
2038    /// cleared when a constant is emitted.
2039    upcoming_label: Option<MachLabel>,
2040    /// Required alignment.
2041    align: CodeOffset,
2042    /// The byte size of this constant.
2043    size: usize,
2044}
2045
2046/// A trap that is deferred to the next time an island is emitted for either
2047/// traps, constants, or fixups.
2048#[derive(Debug)]
2049struct MachLabelTrap {
2050    /// This label will refer to the trap's offset.
2051    label: MachLabel,
2052    /// The code associated with this trap.
2053    code: TrapCode,
2054    /// An optional source location to assign for this trap.
2055    loc: Option<RelSourceLoc>,
2056}
2057
2058/// A fixup to perform on the buffer once code is emitted. Fixups always refer
2059/// to labels and patch the code based on label offsets. Hence, they are like
2060/// relocations, but internal to one buffer.
2061#[derive(Debug)]
2062struct MachLabelFixup<I: VCodeInst> {
2063    /// The label whose offset controls this fixup.
2064    label: MachLabel,
2065    /// The offset to fix up / patch to refer to this label.
2066    offset: CodeOffset,
2067    /// The kind of fixup. This is architecture-specific; each architecture may have,
2068    /// e.g., several types of branch instructions, each with differently-sized
2069    /// offset fields and different places within the instruction to place the
2070    /// bits.
2071    kind: I::LabelUse,
2072}
2073
2074impl<I: VCodeInst> MachLabelFixup<I> {
2075    fn deadline(&self) -> CodeOffset {
2076        self.offset.saturating_add(self.kind.max_pos_range())
2077    }
2078}
2079
2080impl<I: VCodeInst> PartialEq for MachLabelFixup<I> {
2081    fn eq(&self, other: &Self) -> bool {
2082        self.deadline() == other.deadline()
2083    }
2084}
2085
2086impl<I: VCodeInst> Eq for MachLabelFixup<I> {}
2087
2088impl<I: VCodeInst> PartialOrd for MachLabelFixup<I> {
2089    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
2090        Some(self.cmp(other))
2091    }
2092}
2093
2094impl<I: VCodeInst> Ord for MachLabelFixup<I> {
2095    fn cmp(&self, other: &Self) -> Ordering {
2096        other.deadline().cmp(&self.deadline())
2097    }
2098}
2099
2100/// A relocation resulting from a compilation.
2101#[derive(Clone, Debug, PartialEq)]
2102#[cfg_attr(
2103    feature = "enable-serde",
2104    derive(serde_derive::Serialize, serde_derive::Deserialize)
2105)]
2106pub struct MachRelocBase<T> {
2107    /// The offset at which the relocation applies, *relative to the
2108    /// containing section*.
2109    pub offset: CodeOffset,
2110    /// The kind of relocation.
2111    pub kind: Reloc,
2112    /// The external symbol / name to which this relocation refers.
2113    pub target: T,
2114    /// The addend to add to the symbol value.
2115    pub addend: i64,
2116}
2117
2118type MachReloc = MachRelocBase<RelocTarget>;
2119
2120/// A relocation resulting from a compilation.
2121pub type FinalizedMachReloc = MachRelocBase<FinalizedRelocTarget>;
2122
2123/// A Relocation target
2124#[derive(Debug, Clone, PartialEq, Eq, Hash)]
2125pub enum RelocTarget {
2126    /// Points to an [ExternalName] outside the current function.
2127    ExternalName(ExternalName),
2128    /// Points to a [MachLabel] inside this function.
2129    /// This is different from [MachLabelFixup] in that both the relocation and the
2130    /// label will be emitted and are only resolved at link time.
2131    ///
2132    /// There is no reason to prefer this over [MachLabelFixup] unless the ABI requires it.
2133    Label(MachLabel),
2134}
2135
2136impl From<ExternalName> for RelocTarget {
2137    fn from(name: ExternalName) -> Self {
2138        Self::ExternalName(name)
2139    }
2140}
2141
2142impl From<MachLabel> for RelocTarget {
2143    fn from(label: MachLabel) -> Self {
2144        Self::Label(label)
2145    }
2146}
2147
2148/// A Relocation target
2149#[derive(Debug, Clone, PartialEq, Eq, Hash)]
2150#[cfg_attr(
2151    feature = "enable-serde",
2152    derive(serde_derive::Serialize, serde_derive::Deserialize)
2153)]
2154pub enum FinalizedRelocTarget {
2155    /// Points to an [ExternalName] outside the current function.
2156    ExternalName(ExternalName),
2157    /// Points to a [CodeOffset] from the start of the current function.
2158    Func(CodeOffset),
2159}
2160
2161impl FinalizedRelocTarget {
2162    /// Returns a display for the current [FinalizedRelocTarget], with extra context to prettify the
2163    /// output.
2164    pub fn display<'a>(&'a self, params: Option<&'a FunctionParameters>) -> String {
2165        match self {
2166            FinalizedRelocTarget::ExternalName(name) => format!("{}", name.display(params)),
2167            FinalizedRelocTarget::Func(offset) => format!("func+{offset}"),
2168        }
2169    }
2170}
2171
2172/// A trap record resulting from a compilation.
2173#[derive(Clone, Debug, PartialEq)]
2174#[cfg_attr(
2175    feature = "enable-serde",
2176    derive(serde_derive::Serialize, serde_derive::Deserialize)
2177)]
2178pub struct MachTrap {
2179    /// The offset at which the trap instruction occurs, *relative to the
2180    /// containing section*.
2181    pub offset: CodeOffset,
2182    /// The trap code.
2183    pub code: TrapCode,
2184}
2185
2186/// A call site record resulting from a compilation.
2187#[derive(Clone, Debug, PartialEq)]
2188#[cfg_attr(
2189    feature = "enable-serde",
2190    derive(serde_derive::Serialize, serde_derive::Deserialize)
2191)]
2192pub struct MachCallSite {
2193    /// The offset of the call's return address, *relative to the
2194    /// start of the buffer*.
2195    pub ret_addr: CodeOffset,
2196
2197    /// The offset from the FP at this callsite down to the SP when
2198    /// the call occurs, if known. In other words, the size of the
2199    /// stack frame up to the saved FP slot. Useful to recover the
2200    /// start of the stack frame and to look up dynamic contexts
2201    /// stored in [`ExceptionContextLoc::SPOffset`].
2202    ///
2203    /// If `None`, the compiler backend did not specify a frame
2204    /// offset. The runtime in use with the compiled code may require
2205    /// the frame offset if exception handlers are present or dynamic
2206    /// context is used, but that is not Cranelift's concern: the
2207    /// frame offset is optional at this level.
2208    pub frame_offset: Option<u32>,
2209
2210    /// Range in `exception_handlers` corresponding to the exception
2211    /// handlers for this callsite.
2212    exception_handler_range: Range<u32>,
2213}
2214
2215/// A call site record resulting from a compilation.
2216#[derive(Clone, Debug, PartialEq)]
2217pub struct FinalizedMachCallSite<'a> {
2218    /// The offset of the call's return address, *relative to the
2219    /// start of the buffer*.
2220    pub ret_addr: CodeOffset,
2221
2222    /// The offset from the FP at this callsite down to the SP when
2223    /// the call occurs, if known. In other words, the size of the
2224    /// stack frame up to the saved FP slot. Useful to recover the
2225    /// start of the stack frame and to look up dynamic contexts
2226    /// stored in [`ExceptionContextLoc::SPOffset`].
2227    ///
2228    /// If `None`, the compiler backend did not specify a frame
2229    /// offset. The runtime in use with the compiled code may require
2230    /// the frame offset if exception handlers are present or dynamic
2231    /// context is used, but that is not Cranelift's concern: the
2232    /// frame offset is optional at this level.
2233    pub frame_offset: Option<u32>,
2234
2235    /// Exception handlers at this callsite, with target offsets
2236    /// *relative to the start of the buffer*.
2237    pub exception_handlers: &'a [FinalizedMachExceptionHandler],
2238}
2239
2240/// A patchable call site record resulting from a compilation.
2241#[derive(Clone, Debug, PartialEq)]
2242#[cfg_attr(
2243    feature = "enable-serde",
2244    derive(serde_derive::Serialize, serde_derive::Deserialize)
2245)]
2246pub struct MachPatchableCallSite {
2247    /// The offset of the call's return address (i.e., the address
2248    /// after the end of the patchable region), *relative to the start
2249    /// of the buffer*.
2250    pub ret_addr: CodeOffset,
2251
2252    /// The length of the region to be patched by NOP bytes.
2253    pub len: u32,
2254}
2255
2256/// A source-location mapping resulting from a compilation.
2257#[derive(PartialEq, Debug, Clone)]
2258#[cfg_attr(
2259    feature = "enable-serde",
2260    derive(serde_derive::Serialize, serde_derive::Deserialize)
2261)]
2262pub struct MachSrcLoc<T: CompilePhase> {
2263    /// The start of the region of code corresponding to a source location.
2264    /// This is relative to the start of the function, not to the start of the
2265    /// section.
2266    pub start: CodeOffset,
2267    /// The end of the region of code corresponding to a source location.
2268    /// This is relative to the start of the function, not to the start of the
2269    /// section.
2270    pub end: CodeOffset,
2271    /// The source location.
2272    pub loc: T::SourceLocType,
2273}
2274
2275impl MachSrcLoc<Stencil> {
2276    fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> {
2277        MachSrcLoc {
2278            start: self.start,
2279            end: self.end,
2280            loc: self.loc.expand(base_srcloc),
2281        }
2282    }
2283}
2284
2285/// Record of branch instruction in the buffer, to facilitate editing.
2286#[derive(Clone, Debug)]
2287struct MachBranch {
2288    start: CodeOffset,
2289    end: CodeOffset,
2290    target: MachLabel,
2291    fixup: usize,
2292    inverted: Option<SmallVec<[u8; 8]>>,
2293    /// All labels pointing to the start of this branch. For correctness, this
2294    /// *must* be complete (i.e., must contain all labels whose resolved offsets
2295    /// are at the start of this branch): we rely on being able to redirect all
2296    /// labels that could jump to this branch before removing it, if it is
2297    /// otherwise unreachable.
2298    labels_at_this_branch: SmallVec<[MachLabel; 4]>,
2299}
2300
2301impl MachBranch {
2302    fn is_cond(&self) -> bool {
2303        self.inverted.is_some()
2304    }
2305    fn is_uncond(&self) -> bool {
2306        self.inverted.is_none()
2307    }
2308}
2309
2310/// Stack-frame layout information carried through to machine
2311/// code. This provides sufficient information to interpret an active
2312/// stack frame from a running function, if provided.
2313#[derive(Clone, Debug, PartialEq)]
2314#[cfg_attr(
2315    feature = "enable-serde",
2316    derive(serde_derive::Serialize, serde_derive::Deserialize)
2317)]
2318pub struct MachBufferFrameLayout {
2319    /// Offset from bottom of frame to FP (near top of frame). This
2320    /// allows reading the frame given only FP.
2321    pub frame_to_fp_offset: u32,
2322    /// Offset from bottom of frame for each StackSlot,
2323    pub stackslots: SecondaryMap<ir::StackSlot, MachBufferStackSlot>,
2324}
2325
2326/// Descriptor for a single stack slot in the compiled function.
2327#[derive(Clone, Debug, PartialEq, Default)]
2328#[cfg_attr(
2329    feature = "enable-serde",
2330    derive(serde_derive::Serialize, serde_derive::Deserialize)
2331)]
2332pub struct MachBufferStackSlot {
2333    /// Offset from the bottom of the stack frame.
2334    pub offset: u32,
2335
2336    /// User-provided key to describe this stack slot.
2337    pub key: Option<ir::StackSlotKey>,
2338}
2339
2340/// Debug tags: a sequence of references to a stack slot, or a
2341/// user-defined value, at a particular PC.
2342#[derive(Clone, Debug, PartialEq)]
2343#[cfg_attr(
2344    feature = "enable-serde",
2345    derive(serde_derive::Serialize, serde_derive::Deserialize)
2346)]
2347pub(crate) struct MachDebugTags {
2348    /// Offset at which this tag applies.
2349    pub offset: CodeOffset,
2350
2351    /// Position on the attached instruction. This indicates whether
2352    /// the tags attach to the prior instruction (i.e., as a return
2353    /// point from a call) or the current instruction (i.e., as a PC
2354    /// seen during a trap).
2355    pub pos: MachDebugTagPos,
2356
2357    /// The range in the tag pool.
2358    pub range: Range<u32>,
2359}
2360
2361/// Debug tag position on an instruction.
2362///
2363/// We need to distinguish position on an instruction, and not just
2364/// use offsets, because of the following case:
2365///
2366/// ```plain
2367/// <tag1, tag2> call ...
2368/// <tag3, tag4> trapping_store ...
2369/// ```
2370///
2371/// If the stack is walked and interpreted with debug tags while
2372/// within the call, the PC seen will be the return point, i.e. the
2373/// address after the call. If the stack is walked and interpreted
2374/// with debug tags upon a trap of the following instruction, it will
2375/// be the PC of that instruction -- which is the same PC! Thus to
2376/// disambiguate which tags we want, we attach a "pre/post" flag to
2377/// every group of tags at an offset; and when we look up tags, we
2378/// look them up for an offset and "position" at that offset.
2379///
2380/// Thus there are logically two positions at every offset -- so the
2381/// above will be emitted as
2382///
2383/// ```plain
2384/// 0: call ...
2385///                          4, post: <tag1, tag2>
2386///                          4, pre: <tag3, tag4>
2387/// 4: trapping_store ...
2388/// ```
2389#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2390#[cfg_attr(
2391    feature = "enable-serde",
2392    derive(serde_derive::Serialize, serde_derive::Deserialize)
2393)]
2394pub enum MachDebugTagPos {
2395    /// Tags attached after the instruction that ends at this offset.
2396    ///
2397    /// This is used to attach tags to a call, because the PC we see
2398    /// when walking the stack is the *return point*.
2399    Post,
2400    /// Tags attached before the instruction that starts at this offset.
2401    ///
2402    /// This is used to attach tags to every other kind of
2403    /// instruction, because the PC we see when processing a trap of
2404    /// that instruction is the PC of that instruction, not the
2405    /// following one.
2406    Pre,
2407}
2408
2409/// Iterator item for visiting debug tags.
2410pub struct MachBufferDebugTagList<'a> {
2411    /// Offset at which this tag applies.
2412    pub offset: CodeOffset,
2413
2414    /// Position at this offset ("post", attaching to prior
2415    /// instruction, or "pre", attaching to next instruction).
2416    pub pos: MachDebugTagPos,
2417
2418    /// The underlying tags.
2419    pub tags: &'a [DebugTag],
2420}
2421
2422/// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`.
2423///
2424/// Note that `MachBuffer` was primarily written for intra-function references
2425/// of jumps between basic blocks, but it's also quite usable for entire text
2426/// sections and resolving references between functions themselves. This
2427/// builder interprets "blocks" as labeled functions for the purposes of
2428/// resolving labels internally in the buffer.
2429pub struct MachTextSectionBuilder<I: VCodeInst> {
2430    buf: MachBuffer<I>,
2431    next_func: usize,
2432    force_veneers: ForceVeneers,
2433}
2434
2435impl<I: VCodeInst> MachTextSectionBuilder<I> {
2436    /// Creates a new text section builder which will have `num_funcs` functions
2437    /// pushed into it.
2438    pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> {
2439        let mut buf = MachBuffer::new();
2440        buf.reserve_labels_for_blocks(num_funcs);
2441        MachTextSectionBuilder {
2442            buf,
2443            next_func: 0,
2444            force_veneers: ForceVeneers::No,
2445        }
2446    }
2447}
2448
2449impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> {
2450    fn append(
2451        &mut self,
2452        labeled: bool,
2453        func: &[u8],
2454        align: u32,
2455        ctrl_plane: &mut ControlPlane,
2456    ) -> u64 {
2457        // Conditionally emit an island if it's necessary to resolve jumps
2458        // between functions which are too far away.
2459        let size = func.len() as u32;
2460        if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) {
2461            self.buf
2462                .emit_island_maybe_forced(self.force_veneers, size, ctrl_plane);
2463        }
2464
2465        self.buf.align_to(align);
2466        let pos = self.buf.cur_offset();
2467        if labeled {
2468            self.buf.bind_label(
2469                MachLabel::from_block(BlockIndex::new(self.next_func)),
2470                ctrl_plane,
2471            );
2472            self.next_func += 1;
2473        }
2474        self.buf.put_data(func);
2475        u64::from(pos)
2476    }
2477
2478    fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool {
2479        crate::trace!(
2480            "Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}"
2481        );
2482        let label = MachLabel::from_block(BlockIndex::new(target));
2483        let offset = u32::try_from(offset).unwrap();
2484        match I::LabelUse::from_reloc(reloc, addend) {
2485            Some(label_use) => {
2486                self.buf.use_label_at_offset(offset, label, label_use);
2487                true
2488            }
2489            None => false,
2490        }
2491    }
2492
2493    fn force_veneers(&mut self) {
2494        self.force_veneers = ForceVeneers::Yes;
2495    }
2496
2497    fn write(&mut self, offset: u64, data: &[u8]) {
2498        self.buf.data[offset.try_into().unwrap()..][..data.len()].copy_from_slice(data);
2499    }
2500
2501    fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> {
2502        // Double-check all functions were pushed.
2503        assert_eq!(self.next_func, self.buf.label_offsets.len());
2504
2505        // Finish up any veneers, if necessary.
2506        self.buf
2507            .finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane);
2508
2509        // We don't need the data any more, so return it to the caller.
2510        mem::take(&mut self.buf.data).into_vec()
2511    }
2512}
2513
2514// We use an actual instruction definition to do tests, so we depend on the `arm64` feature here.
2515#[cfg(all(test, feature = "arm64"))]
2516mod test {
2517    use cranelift_entity::EntityRef as _;
2518
2519    use super::*;
2520    use crate::ir::UserExternalNameRef;
2521    use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst};
2522    use crate::isa::aarch64::inst::{OperandSize, xreg};
2523    use crate::machinst::{MachInstEmit, MachInstEmitState};
2524    use crate::settings;
2525
2526    fn label(n: u32) -> MachLabel {
2527        MachLabel::from_block(BlockIndex::new(n as usize))
2528    }
2529    fn target(n: u32) -> BranchTarget {
2530        BranchTarget::Label(label(n))
2531    }
2532
2533    #[test]
2534    fn test_elide_jump_to_next() {
2535        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2536        let mut buf = MachBuffer::new();
2537        let mut state = <Inst as MachInstEmit>::State::default();
2538        let constants = Default::default();
2539
2540        buf.reserve_labels_for_blocks(2);
2541        buf.bind_label(label(0), state.ctrl_plane_mut());
2542        let inst = Inst::Jump { dest: target(1) };
2543        inst.emit(&mut buf, &info, &mut state);
2544        buf.bind_label(label(1), state.ctrl_plane_mut());
2545        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2546        assert_eq!(0, buf.total_size());
2547    }
2548
2549    #[test]
2550    fn test_elide_trivial_jump_blocks() {
2551        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2552        let mut buf = MachBuffer::new();
2553        let mut state = <Inst as MachInstEmit>::State::default();
2554        let constants = Default::default();
2555
2556        buf.reserve_labels_for_blocks(4);
2557
2558        buf.bind_label(label(0), state.ctrl_plane_mut());
2559        let inst = Inst::CondBr {
2560            kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2561            taken: target(1),
2562            not_taken: target(2),
2563        };
2564        inst.emit(&mut buf, &info, &mut state);
2565
2566        buf.bind_label(label(1), state.ctrl_plane_mut());
2567        let inst = Inst::Jump { dest: target(3) };
2568        inst.emit(&mut buf, &info, &mut state);
2569
2570        buf.bind_label(label(2), state.ctrl_plane_mut());
2571        let inst = Inst::Jump { dest: target(3) };
2572        inst.emit(&mut buf, &info, &mut state);
2573
2574        buf.bind_label(label(3), state.ctrl_plane_mut());
2575
2576        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2577        assert_eq!(0, buf.total_size());
2578    }
2579
2580    #[test]
2581    fn test_flip_cond() {
2582        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2583        let mut buf = MachBuffer::new();
2584        let mut state = <Inst as MachInstEmit>::State::default();
2585        let constants = Default::default();
2586
2587        buf.reserve_labels_for_blocks(4);
2588
2589        buf.bind_label(label(0), state.ctrl_plane_mut());
2590        let inst = Inst::CondBr {
2591            kind: CondBrKind::Zero(xreg(0), OperandSize::Size64),
2592            taken: target(1),
2593            not_taken: target(2),
2594        };
2595        inst.emit(&mut buf, &info, &mut state);
2596
2597        buf.bind_label(label(1), state.ctrl_plane_mut());
2598        let inst = Inst::Nop4;
2599        inst.emit(&mut buf, &info, &mut state);
2600
2601        buf.bind_label(label(2), state.ctrl_plane_mut());
2602        let inst = Inst::Udf {
2603            trap_code: TrapCode::STACK_OVERFLOW,
2604        };
2605        inst.emit(&mut buf, &info, &mut state);
2606
2607        buf.bind_label(label(3), state.ctrl_plane_mut());
2608
2609        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2610
2611        let mut buf2 = MachBuffer::new();
2612        let mut state = Default::default();
2613        let inst = Inst::TrapIf {
2614            kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2615            trap_code: TrapCode::STACK_OVERFLOW,
2616        };
2617        inst.emit(&mut buf2, &info, &mut state);
2618        let inst = Inst::Nop4;
2619        inst.emit(&mut buf2, &info, &mut state);
2620
2621        let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2622
2623        assert_eq!(buf.data, buf2.data);
2624    }
2625
2626    #[test]
2627    fn test_island() {
2628        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2629        let mut buf = MachBuffer::new();
2630        let mut state = <Inst as MachInstEmit>::State::default();
2631        let constants = Default::default();
2632
2633        buf.reserve_labels_for_blocks(4);
2634
2635        buf.bind_label(label(0), state.ctrl_plane_mut());
2636        let inst = Inst::CondBr {
2637            kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2638            taken: target(2),
2639            not_taken: target(3),
2640        };
2641        inst.emit(&mut buf, &info, &mut state);
2642
2643        buf.bind_label(label(1), state.ctrl_plane_mut());
2644        while buf.cur_offset() < 2000000 {
2645            if buf.island_needed(0) {
2646                buf.emit_island(0, state.ctrl_plane_mut());
2647            }
2648            let inst = Inst::Nop4;
2649            inst.emit(&mut buf, &info, &mut state);
2650        }
2651
2652        buf.bind_label(label(2), state.ctrl_plane_mut());
2653        let inst = Inst::Nop4;
2654        inst.emit(&mut buf, &info, &mut state);
2655
2656        buf.bind_label(label(3), state.ctrl_plane_mut());
2657        let inst = Inst::Nop4;
2658        inst.emit(&mut buf, &info, &mut state);
2659
2660        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2661
2662        assert_eq!(2000000 + 8, buf.total_size());
2663
2664        let mut buf2 = MachBuffer::new();
2665        let mut state = Default::default();
2666        let inst = Inst::CondBr {
2667            kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2668
2669            // This conditionally taken branch has a 19-bit constant, shifted
2670            // to the left by two, giving us a 21-bit range in total. Half of
2671            // this range positive so the we should be around 1 << 20 bytes
2672            // away for our jump target.
2673            //
2674            // There are two pending fixups by the time we reach this point,
2675            // one for this 19-bit jump and one for the unconditional 26-bit
2676            // jump below. A 19-bit veneer is 4 bytes large and the 26-bit
2677            // veneer is 20 bytes large, which means that pessimistically
2678            // assuming we'll need two veneers. Currently each veneer is
2679            // pessimistically assumed to be the maximal size which means we
2680            // need 40 bytes of extra space, meaning that the actual island
2681            // should come 40-bytes before the deadline.
2682            taken: BranchTarget::ResolvedOffset((1 << 20) - 20 - 20),
2683
2684            // This branch is in-range so no veneers should be needed, it should
2685            // go directly to the target.
2686            not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4),
2687        };
2688        inst.emit(&mut buf2, &info, &mut state);
2689
2690        let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2691
2692        assert_eq!(&buf.data[0..8], &buf2.data[..]);
2693    }
2694
2695    #[test]
2696    fn test_island_backward() {
2697        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2698        let mut buf = MachBuffer::new();
2699        let mut state = <Inst as MachInstEmit>::State::default();
2700        let constants = Default::default();
2701
2702        buf.reserve_labels_for_blocks(4);
2703
2704        buf.bind_label(label(0), state.ctrl_plane_mut());
2705        let inst = Inst::Nop4;
2706        inst.emit(&mut buf, &info, &mut state);
2707
2708        buf.bind_label(label(1), state.ctrl_plane_mut());
2709        let inst = Inst::Nop4;
2710        inst.emit(&mut buf, &info, &mut state);
2711
2712        buf.bind_label(label(2), state.ctrl_plane_mut());
2713        while buf.cur_offset() < 2000000 {
2714            let inst = Inst::Nop4;
2715            inst.emit(&mut buf, &info, &mut state);
2716        }
2717
2718        buf.bind_label(label(3), state.ctrl_plane_mut());
2719        let inst = Inst::CondBr {
2720            kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2721            taken: target(0),
2722            not_taken: target(1),
2723        };
2724        inst.emit(&mut buf, &info, &mut state);
2725
2726        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2727
2728        assert_eq!(2000000 + 12, buf.total_size());
2729
2730        let mut buf2 = MachBuffer::new();
2731        let mut state = Default::default();
2732        let inst = Inst::CondBr {
2733            kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2734            taken: BranchTarget::ResolvedOffset(8),
2735            not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)),
2736        };
2737        inst.emit(&mut buf2, &info, &mut state);
2738        let inst = Inst::Jump {
2739            dest: BranchTarget::ResolvedOffset(-(2000000 + 8)),
2740        };
2741        inst.emit(&mut buf2, &info, &mut state);
2742
2743        let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2744
2745        assert_eq!(&buf.data[2000000..], &buf2.data[..]);
2746    }
2747
2748    #[test]
2749    fn test_multiple_redirect() {
2750        // label0:
2751        //   cbz x0, label1
2752        //   b label2
2753        // label1:
2754        //   b label3
2755        // label2:
2756        //   nop
2757        //   nop
2758        //   b label0
2759        // label3:
2760        //   b label4
2761        // label4:
2762        //   b label5
2763        // label5:
2764        //   b label7
2765        // label6:
2766        //   nop
2767        // label7:
2768        //   ret
2769        //
2770        // -- should become:
2771        //
2772        // label0:
2773        //   cbz x0, label7
2774        // label2:
2775        //   nop
2776        //   nop
2777        //   b label0
2778        // label6:
2779        //   nop
2780        // label7:
2781        //   ret
2782
2783        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2784        let mut buf = MachBuffer::new();
2785        let mut state = <Inst as MachInstEmit>::State::default();
2786        let constants = Default::default();
2787
2788        buf.reserve_labels_for_blocks(8);
2789
2790        buf.bind_label(label(0), state.ctrl_plane_mut());
2791        let inst = Inst::CondBr {
2792            kind: CondBrKind::Zero(xreg(0), OperandSize::Size64),
2793            taken: target(1),
2794            not_taken: target(2),
2795        };
2796        inst.emit(&mut buf, &info, &mut state);
2797
2798        buf.bind_label(label(1), state.ctrl_plane_mut());
2799        let inst = Inst::Jump { dest: target(3) };
2800        inst.emit(&mut buf, &info, &mut state);
2801
2802        buf.bind_label(label(2), state.ctrl_plane_mut());
2803        let inst = Inst::Nop4;
2804        inst.emit(&mut buf, &info, &mut state);
2805        inst.emit(&mut buf, &info, &mut state);
2806        let inst = Inst::Jump { dest: target(0) };
2807        inst.emit(&mut buf, &info, &mut state);
2808
2809        buf.bind_label(label(3), state.ctrl_plane_mut());
2810        let inst = Inst::Jump { dest: target(4) };
2811        inst.emit(&mut buf, &info, &mut state);
2812
2813        buf.bind_label(label(4), state.ctrl_plane_mut());
2814        let inst = Inst::Jump { dest: target(5) };
2815        inst.emit(&mut buf, &info, &mut state);
2816
2817        buf.bind_label(label(5), state.ctrl_plane_mut());
2818        let inst = Inst::Jump { dest: target(7) };
2819        inst.emit(&mut buf, &info, &mut state);
2820
2821        buf.bind_label(label(6), state.ctrl_plane_mut());
2822        let inst = Inst::Nop4;
2823        inst.emit(&mut buf, &info, &mut state);
2824
2825        buf.bind_label(label(7), state.ctrl_plane_mut());
2826        let inst = Inst::Ret {};
2827        inst.emit(&mut buf, &info, &mut state);
2828
2829        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2830
2831        let golden_data = vec![
2832            0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14
2833            0x1f, 0x20, 0x03, 0xd5, // nop
2834            0x1f, 0x20, 0x03, 0xd5, // nop
2835            0xfd, 0xff, 0xff, 0x17, // b 0
2836            0x1f, 0x20, 0x03, 0xd5, // nop
2837            0xc0, 0x03, 0x5f, 0xd6, // ret
2838        ];
2839
2840        assert_eq!(&golden_data[..], &buf.data[..]);
2841    }
2842
2843    #[test]
2844    fn test_handle_branch_cycle() {
2845        // label0:
2846        //   b label1
2847        // label1:
2848        //   b label2
2849        // label2:
2850        //   b label3
2851        // label3:
2852        //   b label4
2853        // label4:
2854        //   b label1  // note: not label0 (to make it interesting).
2855        //
2856        // -- should become:
2857        //
2858        // label0, label1, ..., label4:
2859        //   b label0
2860        let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2861        let mut buf = MachBuffer::new();
2862        let mut state = <Inst as MachInstEmit>::State::default();
2863        let constants = Default::default();
2864
2865        buf.reserve_labels_for_blocks(5);
2866
2867        buf.bind_label(label(0), state.ctrl_plane_mut());
2868        let inst = Inst::Jump { dest: target(1) };
2869        inst.emit(&mut buf, &info, &mut state);
2870
2871        buf.bind_label(label(1), state.ctrl_plane_mut());
2872        let inst = Inst::Jump { dest: target(2) };
2873        inst.emit(&mut buf, &info, &mut state);
2874
2875        buf.bind_label(label(2), state.ctrl_plane_mut());
2876        let inst = Inst::Jump { dest: target(3) };
2877        inst.emit(&mut buf, &info, &mut state);
2878
2879        buf.bind_label(label(3), state.ctrl_plane_mut());
2880        let inst = Inst::Jump { dest: target(4) };
2881        inst.emit(&mut buf, &info, &mut state);
2882
2883        buf.bind_label(label(4), state.ctrl_plane_mut());
2884        let inst = Inst::Jump { dest: target(1) };
2885        inst.emit(&mut buf, &info, &mut state);
2886
2887        let buf = buf.finish(&constants, state.ctrl_plane_mut());
2888
2889        let golden_data = vec![
2890            0x00, 0x00, 0x00, 0x14, // b 0
2891        ];
2892
2893        assert_eq!(&golden_data[..], &buf.data[..]);
2894    }
2895
2896    #[test]
2897    fn metadata_records() {
2898        let mut buf = MachBuffer::<Inst>::new();
2899        let ctrl_plane = &mut Default::default();
2900        let constants = Default::default();
2901
2902        buf.reserve_labels_for_blocks(3);
2903
2904        buf.bind_label(label(0), ctrl_plane);
2905        buf.put1(1);
2906        buf.add_trap(TrapCode::HEAP_OUT_OF_BOUNDS);
2907        buf.put1(2);
2908        buf.add_trap(TrapCode::INTEGER_OVERFLOW);
2909        buf.add_trap(TrapCode::INTEGER_DIVISION_BY_ZERO);
2910        buf.add_try_call_site(
2911            Some(0x10),
2912            [
2913                MachExceptionHandler::Tag(ExceptionTag::new(42), label(2)),
2914                MachExceptionHandler::Default(label(1)),
2915            ]
2916            .into_iter(),
2917        );
2918        buf.add_reloc(
2919            Reloc::Abs4,
2920            &ExternalName::User(UserExternalNameRef::new(0)),
2921            0,
2922        );
2923        buf.put1(3);
2924        buf.add_reloc(
2925            Reloc::Abs8,
2926            &ExternalName::User(UserExternalNameRef::new(1)),
2927            1,
2928        );
2929        buf.put1(4);
2930        buf.bind_label(label(1), ctrl_plane);
2931        buf.put1(0xff);
2932        buf.bind_label(label(2), ctrl_plane);
2933        buf.put1(0xff);
2934
2935        let buf = buf.finish(&constants, ctrl_plane);
2936
2937        assert_eq!(buf.data(), &[1, 2, 3, 4, 0xff, 0xff]);
2938        assert_eq!(
2939            buf.traps()
2940                .iter()
2941                .map(|trap| (trap.offset, trap.code))
2942                .collect::<Vec<_>>(),
2943            vec![
2944                (1, TrapCode::HEAP_OUT_OF_BOUNDS),
2945                (2, TrapCode::INTEGER_OVERFLOW),
2946                (2, TrapCode::INTEGER_DIVISION_BY_ZERO)
2947            ]
2948        );
2949        let call_sites: Vec<_> = buf.call_sites().collect();
2950        assert_eq!(call_sites[0].ret_addr, 2);
2951        assert_eq!(call_sites[0].frame_offset, Some(0x10));
2952        assert_eq!(
2953            call_sites[0].exception_handlers,
2954            &[
2955                FinalizedMachExceptionHandler::Tag(ExceptionTag::new(42), 5),
2956                FinalizedMachExceptionHandler::Default(4)
2957            ],
2958        );
2959        assert_eq!(
2960            buf.relocs()
2961                .iter()
2962                .map(|reloc| (reloc.offset, reloc.kind))
2963                .collect::<Vec<_>>(),
2964            vec![(2, Reloc::Abs4), (3, Reloc::Abs8)]
2965        );
2966    }
2967}