cranelift_codegen/machinst/buffer.rs
1//! In-memory representation of compiled machine code, with labels and fixups to
2//! refer to those labels. Handles constant-pool island insertion and also
3//! veneer insertion for out-of-range jumps.
4//!
5//! This code exists to solve three problems:
6//!
7//! - Branch targets for forward branches are not known until later, when we
8//! emit code in a single pass through the instruction structs.
9//!
10//! - On many architectures, address references or offsets have limited range.
11//! For example, on AArch64, conditional branches can only target code +/- 1MB
12//! from the branch itself.
13//!
14//! - The lowering of control flow from the CFG-with-edges produced by
15//! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty
16//! edge blocks when the register allocator does not need to insert any
17//! spills/reloads/moves in edge blocks, results in many suboptimal branch
18//! patterns. The lowering also pays no attention to block order, and so
19//! two-target conditional forms (cond-br followed by uncond-br) can often by
20//! avoided because one of the targets is the fallthrough. There are several
21//! cases here where we can simplify to use fewer branches.
22//!
23//! This "buffer" implements a single-pass code emission strategy (with a later
24//! "fixup" pass, but only through recorded fixups, not all instructions). The
25//! basic idea is:
26//!
27//! - Emit branches as they are, including two-target (cond/uncond) compound
28//! forms, but with zero offsets and optimistically assuming the target will be
29//! in range. Record the "fixup" for later. Targets are denoted instead by
30//! symbolic "labels" that are then bound to certain offsets in the buffer as
31//! we emit code. (Nominally, there is a label at the start of every basic
32//! block.)
33//!
34//! - As we do this, track the offset in the buffer at which the first label
35//! reference "goes out of range". We call this the "deadline". If we reach the
36//! deadline and we still have not bound the label to which an unresolved branch
37//! refers, we have a problem!
38//!
39//! - To solve this problem, we emit "islands" full of "veneers". An island is
40//! simply a chunk of code inserted in the middle of the code actually produced
41//! by the emitter (e.g., vcode iterating over instruction structs). The emitter
42//! has some awareness of this: it either asks for an island between blocks, so
43//! it is not accidentally executed, or else it emits a branch around the island
44//! when all other options fail (see `Inst::EmitIsland` meta-instruction).
45//!
46//! - A "veneer" is an instruction (or sequence of instructions) in an "island"
47//! that implements a longer-range reference to a label. The idea is that, for
48//! example, a branch with a limited range can branch to a "veneer" instead,
49//! which is simply a branch in a form that can use a longer-range reference. On
50//! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional
51//! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a
52//! conditional branch's label reference can be fixed up with a "veneer" to
53//! achieve a longer range.
54//!
55//! - To implement all of this, we require the backend to provide a `LabelUse`
56//! type that implements a trait. This is nominally an enum that records one of
57//! several kinds of references to an offset in code -- basically, a relocation
58//! type -- and will usually correspond to different instruction formats. The
59//! `LabelUse` implementation specifies the maximum range, how to patch in the
60//! actual label location when known, and how to generate a veneer to extend the
61//! range.
62//!
63//! That satisfies label references, but we still may have suboptimal branch
64//! patterns. To clean up the branches, we do a simple "peephole"-style
65//! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`)
66//! informs the buffer of branches in the code and, in the case of conditionals,
67//! the code that would have been emitted to invert this branch's condition. We
68//! track the "latest branches": these are branches that are contiguous up to
69//! the current offset. (If any code is emitted after a branch, that branch or
70//! run of contiguous branches is no longer "latest".) The latest branches are
71//! those that we can edit by simply truncating the buffer and doing something
72//! else instead.
73//!
74//! To optimize branches, we implement several simple rules, and try to apply
75//! them to the "latest branches" when possible:
76//!
77//! - A branch with a label target, when that label is bound to the ending
78//! offset of the branch (the fallthrough location), can be removed altogether,
79//! because the branch would have no effect).
80//!
81//! - An unconditional branch that starts at a label location, and branches to
82//! another label, results in a "label alias": all references to the label bound
83//! *to* this branch instruction are instead resolved to the *target* of the
84//! branch instruction. This effectively removes empty blocks that just
85//! unconditionally branch to the next block. We call this "branch threading".
86//!
87//! - A conditional followed by an unconditional, when the conditional branches
88//! to the unconditional's fallthrough, results in (i) the truncation of the
89//! unconditional, (ii) the inversion of the condition's condition, and (iii)
90//! replacement of the conditional's target (using the original target of the
91//! unconditional). This is a fancy way of saying "we can flip a two-target
92//! conditional branch's taken/not-taken targets if it works better with our
93//! fallthrough". To make this work, the emitter actually gives the buffer
94//! *both* forms of every conditional branch: the true form is emitted into the
95//! buffer, and the "inverted" machine-code bytes are provided as part of the
96//! branch-fixup metadata.
97//!
98//! - An unconditional B preceded by another unconditional P, when B's label(s) have
99//! been redirected to target(B), can be removed entirely. This is an extension
100//! of the branch-threading optimization, and is valid because if we know there
101//! will be no fallthrough into this branch instruction (the prior instruction
102//! is an unconditional jump), and if we know we have successfully redirected
103//! all labels, then this branch instruction is unreachable. Note that this
104//! works because the redirection happens before the label is ever resolved
105//! (fixups happen at island emission time, at which point latest-branches are
106//! cleared, or at the end of emission), so we are sure to catch and redirect
107//! all possible paths to this instruction.
108//!
109//! # Branch-optimization Correctness
110//!
111//! The branch-optimization mechanism depends on a few data structures with
112//! invariants, which are always held outside the scope of top-level public
113//! methods:
114//!
115//! - The latest-branches list. Each entry describes a span of the buffer
116//! (start/end offsets), the label target, the corresponding fixup-list entry
117//! index, and the bytes (must be the same length) for the inverted form, if
118//! conditional. The list of labels that are bound to the start-offset of this
119//! branch is *complete* (if any label has a resolved offset equal to `start`
120//! and is not an alias, it must appear in this list) and *precise* (no label
121//! in this list can be bound to another offset). No label in this list should
122//! be an alias. No two branch ranges can overlap, and branches are in
123//! ascending-offset order.
124//!
125//! - The labels-at-tail list. This contains all MachLabels that have been bound
126//! to (whose resolved offsets are equal to) the tail offset of the buffer.
127//! No label in this list should be an alias.
128//!
129//! - The label_offsets array, containing the bound offset of a label or
130//! UNKNOWN. No label can be bound at an offset greater than the current
131//! buffer tail.
132//!
133//! - The label_aliases array, containing another label to which a label is
134//! bound or UNKNOWN. A label's resolved offset is the resolved offset
135//! of the label it is aliased to, if this is set.
136//!
137//! We argue below, at each method, how the invariants in these data structures
138//! are maintained (grep for "Post-invariant").
139//!
140//! Given these invariants, we argue why each optimization preserves execution
141//! semantics below (grep for "Preserves execution semantics").
142//!
143//! # Avoiding Quadratic Behavior
144//!
145//! There are two cases where we've had to take some care to avoid
146//! quadratic worst-case behavior:
147//!
148//! - The "labels at this branch" list can grow unboundedly if the
149//! code generator binds many labels at one location. If the count
150//! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we
151//! simply abort an optimization early in a way that is always correct
152//! but is conservative.
153//!
154//! - The fixup list can interact with island emission to create
155//! "quadratic island behavior". In a little more detail, one can hit
156//! this behavior by having some pending fixups (forward label
157//! references) with long-range label-use kinds, and some others
158//! with shorter-range references that nonetheless still are pending
159//! long enough to trigger island generation. In such a case, we
160//! process the fixup list, generate veneers to extend some forward
161//! references' ranges, but leave the other (longer-range) ones
162//! alone. The way this was implemented put them back on a list and
163//! resulted in quadratic behavior.
164//!
165//! To avoid this fixups are split into two lists: one "pending" list and one
166//! final list. The pending list is kept around for handling fixups related to
167//! branches so it can be edited/truncated. When an island is reached, which
168//! starts processing fixups, all pending fixups are flushed into the final
169//! list. The final list is a `BinaryHeap` which enables fixup processing to
170//! only process those which are required during island emission, deferring
171//! all longer-range fixups to later.
172
173use crate::binemit::{Addend, CodeOffset, Reloc};
174use crate::ir::function::FunctionParameters;
175use crate::ir::{DebugTag, ExceptionTag, ExternalName, RelSourceLoc, SourceLoc, TrapCode};
176use crate::isa::unwind::UnwindInst;
177use crate::machinst::{
178 BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst,
179};
180use crate::trace;
181use crate::{MachInstEmitState, ir};
182use crate::{VCodeConstantData, timing};
183use core::ops::Range;
184use cranelift_control::ControlPlane;
185use cranelift_entity::{PrimaryMap, SecondaryMap, entity_impl};
186use smallvec::SmallVec;
187use std::cmp::Ordering;
188use std::collections::BinaryHeap;
189use std::mem;
190use std::string::String;
191use std::vec::Vec;
192
193#[cfg(feature = "enable-serde")]
194use serde::{Deserialize, Serialize};
195
196#[cfg(feature = "enable-serde")]
197pub trait CompilePhase {
198 type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
199 type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
200}
201
202#[cfg(not(feature = "enable-serde"))]
203pub trait CompilePhase {
204 type MachSrcLocType: core::fmt::Debug + PartialEq + Clone;
205 type SourceLocType: core::fmt::Debug + PartialEq + Clone;
206}
207
208/// Status of a compiled artifact that needs patching before being used.
209#[derive(Clone, Debug, PartialEq)]
210#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
211pub struct Stencil;
212
213/// Status of a compiled artifact ready to use.
214#[derive(Clone, Debug, PartialEq)]
215pub struct Final;
216
217impl CompilePhase for Stencil {
218 type MachSrcLocType = MachSrcLoc<Stencil>;
219 type SourceLocType = RelSourceLoc;
220}
221
222impl CompilePhase for Final {
223 type MachSrcLocType = MachSrcLoc<Final>;
224 type SourceLocType = SourceLoc;
225}
226
227#[derive(Clone, Copy, Debug, PartialEq, Eq)]
228enum ForceVeneers {
229 Yes,
230 No,
231}
232
233/// A buffer of output to be produced, fixed up, and then emitted to a CodeSink
234/// in bulk.
235///
236/// This struct uses `SmallVec`s to support small-ish function bodies without
237/// any heap allocation. As such, it will be several kilobytes large. This is
238/// likely fine as long as it is stack-allocated for function emission then
239/// thrown away; but beware if many buffer objects are retained persistently.
240pub struct MachBuffer<I: VCodeInst> {
241 /// The buffer contents, as raw bytes.
242 data: SmallVec<[u8; 1024]>,
243 /// The required alignment of this buffer.
244 min_alignment: u32,
245 /// Any relocations referring to this code. Note that only *external*
246 /// relocations are tracked here; references to labels within the buffer are
247 /// resolved before emission.
248 relocs: SmallVec<[MachReloc; 16]>,
249 /// Any trap records referring to this code.
250 traps: SmallVec<[MachTrap; 16]>,
251 /// Any call site records referring to this code.
252 call_sites: SmallVec<[MachCallSite; 16]>,
253 /// Any patchable call site locations.
254 patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>,
255 /// Any exception-handler records referred to at call sites.
256 exception_handlers: SmallVec<[MachExceptionHandler; 16]>,
257 /// Any source location mappings referring to this code.
258 srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>,
259 /// Any debug tags referring to this code.
260 debug_tags: Vec<MachDebugTags>,
261 /// Pool of debug tags referenced by `MachDebugTags` entries.
262 debug_tag_pool: Vec<DebugTag>,
263 /// Any user stack maps for this code.
264 ///
265 /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
266 /// by code offset, and each stack map covers `span` bytes on the stack.
267 user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
268 /// Any unwind info at a given location.
269 unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
270 /// The current source location in progress (after `start_srcloc()` and
271 /// before `end_srcloc()`). This is a (start_offset, src_loc) tuple.
272 cur_srcloc: Option<(CodeOffset, RelSourceLoc)>,
273 /// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown.
274 label_offsets: SmallVec<[CodeOffset; 16]>,
275 /// Label aliases: when one label points to an unconditional jump, and that
276 /// jump points to another label, we can redirect references to the first
277 /// label immediately to the second.
278 ///
279 /// Invariant: we don't have label-alias cycles. We ensure this by,
280 /// before setting label A to alias label B, resolving B's alias
281 /// target (iteratively until a non-aliased label); if B is already
282 /// aliased to A, then we cannot alias A back to B.
283 label_aliases: SmallVec<[MachLabel; 16]>,
284 /// Constants that must be emitted at some point.
285 pending_constants: SmallVec<[VCodeConstant; 16]>,
286 /// Byte size of all constants in `pending_constants`.
287 pending_constants_size: CodeOffset,
288 /// Traps that must be emitted at some point.
289 pending_traps: SmallVec<[MachLabelTrap; 16]>,
290 /// Fixups that haven't yet been flushed into `fixup_records` below and may
291 /// be related to branches that are chomped. These all get added to
292 /// `fixup_records` during island emission.
293 pending_fixup_records: SmallVec<[MachLabelFixup<I>; 16]>,
294 /// The nearest upcoming deadline for entries in `pending_fixup_records`.
295 pending_fixup_deadline: CodeOffset,
296 /// Fixups that must be performed after all code is emitted.
297 fixup_records: BinaryHeap<MachLabelFixup<I>>,
298 /// Latest branches, to facilitate in-place editing for better fallthrough
299 /// behavior and empty-block removal.
300 latest_branches: SmallVec<[MachBranch; 4]>,
301 /// All labels at the current offset (emission tail). This is lazily
302 /// cleared: it is actually accurate as long as the current offset is
303 /// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should
304 /// be considered as empty.
305 ///
306 /// For correctness, this *must* be complete (i.e., the vector must contain
307 /// all labels whose offsets are resolved to the current tail), because we
308 /// rely on it to update labels when we truncate branches.
309 labels_at_tail: SmallVec<[MachLabel; 4]>,
310 /// The last offset at which `labels_at_tail` is valid. It is conceptually
311 /// always describing the tail of the buffer, but we do not clear
312 /// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it
313 /// when the offset has grown past this (`labels_at_tail_off`) point.
314 /// Always <= `cur_offset()`.
315 labels_at_tail_off: CodeOffset,
316 /// Metadata about all constants that this function has access to.
317 ///
318 /// This records the size/alignment of all constants (not the actual data)
319 /// along with the last available label generated for the constant. This map
320 /// is consulted when constants are referred to and the label assigned to a
321 /// constant may change over time as well.
322 constants: PrimaryMap<VCodeConstant, MachBufferConstant>,
323 /// All recorded usages of constants as pairs of the constant and where the
324 /// constant needs to be placed within `self.data`. Note that the same
325 /// constant may appear in this array multiple times if it was emitted
326 /// multiple times.
327 used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>,
328 /// Indicates when a patchable region is currently open, to guard that it's
329 /// not possible to nest patchable regions.
330 open_patchable: bool,
331 /// Stack frame layout metadata. If provided for a MachBuffer
332 /// containing a function body, this allows interpretation of
333 /// runtime state given a view of an active stack frame.
334 frame_layout: Option<MachBufferFrameLayout>,
335}
336
337impl MachBufferFinalized<Stencil> {
338 /// Get a finalized machine buffer by applying the function's base source location.
339 pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> {
340 MachBufferFinalized {
341 data: self.data,
342 relocs: self.relocs,
343 traps: self.traps,
344 call_sites: self.call_sites,
345 patchable_call_sites: self.patchable_call_sites,
346 exception_handlers: self.exception_handlers,
347 srclocs: self
348 .srclocs
349 .into_iter()
350 .map(|srcloc| srcloc.apply_base_srcloc(base_srcloc))
351 .collect(),
352 debug_tags: self.debug_tags,
353 debug_tag_pool: self.debug_tag_pool,
354 user_stack_maps: self.user_stack_maps,
355 unwind_info: self.unwind_info,
356 alignment: self.alignment,
357 frame_layout: self.frame_layout,
358 nop_units: self.nop_units,
359 }
360 }
361}
362
363/// A `MachBuffer` once emission is completed: holds generated code and records,
364/// without fixups. This allows the type to be independent of the backend.
365#[derive(PartialEq, Debug, Clone)]
366#[cfg_attr(
367 feature = "enable-serde",
368 derive(serde_derive::Serialize, serde_derive::Deserialize)
369)]
370pub struct MachBufferFinalized<T: CompilePhase> {
371 /// The buffer contents, as raw bytes.
372 pub(crate) data: SmallVec<[u8; 1024]>,
373 /// Any relocations referring to this code. Note that only *external*
374 /// relocations are tracked here; references to labels within the buffer are
375 /// resolved before emission.
376 pub(crate) relocs: SmallVec<[FinalizedMachReloc; 16]>,
377 /// Any trap records referring to this code.
378 pub(crate) traps: SmallVec<[MachTrap; 16]>,
379 /// Any call site records referring to this code.
380 pub(crate) call_sites: SmallVec<[MachCallSite; 16]>,
381 /// Any patchable call site locations refering to this code.
382 pub(crate) patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>,
383 /// Any exception-handler records referred to at call sites.
384 pub(crate) exception_handlers: SmallVec<[FinalizedMachExceptionHandler; 16]>,
385 /// Any source location mappings referring to this code.
386 pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>,
387 /// Any debug tags referring to this code.
388 pub(crate) debug_tags: Vec<MachDebugTags>,
389 /// Pool of debug tags referenced by `MachDebugTags` entries.
390 pub(crate) debug_tag_pool: Vec<DebugTag>,
391 /// Any user stack maps for this code.
392 ///
393 /// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
394 /// by code offset, and each stack map covers `span` bytes on the stack.
395 pub(crate) user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
396 /// Stack frame layout metadata. If provided for a MachBuffer
397 /// containing a function body, this allows interpretation of
398 /// runtime state given a view of an active stack frame.
399 pub(crate) frame_layout: Option<MachBufferFrameLayout>,
400 /// Any unwind info at a given location.
401 pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
402 /// The required alignment of this buffer.
403 pub alignment: u32,
404 /// The means by which to NOP out patchable call sites.
405 ///
406 /// This allows a consumer of a `MachBufferFinalized` to disable
407 /// patchable call sites (which are enabled by default) without
408 /// specific knowledge of the target ISA.
409 ///
410 /// Each entry is one form of nop, and these are required to be
411 /// sorted in ascending-size order.
412 pub nop_units: Vec<Vec<u8>>,
413}
414
415const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff;
416const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff);
417
418/// Threshold on max length of `labels_at_this_branch` list to avoid
419/// unbounded quadratic behavior (see comment below at use-site).
420const LABEL_LIST_THRESHOLD: usize = 100;
421
422/// A label refers to some offset in a `MachBuffer`. It may not be resolved at
423/// the point at which it is used by emitted code; the buffer records "fixups"
424/// for references to the label, and will come back and patch the code
425/// appropriately when the label's location is eventually known.
426#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
427pub struct MachLabel(u32);
428entity_impl!(MachLabel);
429
430impl MachLabel {
431 /// Get a label for a block. (The first N MachLabels are always reserved for
432 /// the N blocks in the vcode.)
433 pub fn from_block(bindex: BlockIndex) -> MachLabel {
434 MachLabel(bindex.index() as u32)
435 }
436
437 /// Creates a string representing this label, for convenience.
438 pub fn to_string(&self) -> String {
439 format!("label{}", self.0)
440 }
441}
442
443impl Default for MachLabel {
444 fn default() -> Self {
445 UNKNOWN_LABEL
446 }
447}
448
449/// Represents the beginning of an editable region in the [`MachBuffer`], while code emission is
450/// still occurring. An [`OpenPatchRegion`] is closed by [`MachBuffer::end_patchable`], consuming
451/// the [`OpenPatchRegion`] token in the process.
452pub struct OpenPatchRegion(usize);
453
454/// A region in the [`MachBuffer`] code buffer that can be edited prior to finalization. An example
455/// of where you might want to use this is for patching instructions that mention constants that
456/// won't be known until later: [`MachBuffer::start_patchable`] can be used to begin the patchable
457/// region, instructions can be emitted with placeholder constants, and the [`PatchRegion`] token
458/// can be produced by [`MachBuffer::end_patchable`]. Once the values of those constants are known,
459/// the [`PatchRegion::patch`] function can be used to get a mutable buffer to the instruction
460/// bytes, and the constants uses can be updated directly.
461pub struct PatchRegion {
462 range: Range<usize>,
463}
464
465impl PatchRegion {
466 /// Consume the patch region to yield a mutable slice of the [`MachBuffer`] data buffer.
467 pub fn patch<I: VCodeInst>(self, buffer: &mut MachBuffer<I>) -> &mut [u8] {
468 &mut buffer.data[self.range]
469 }
470}
471
472impl<I: VCodeInst> MachBuffer<I> {
473 /// Create a new section, known to start at `start_offset` and with a size limited to
474 /// `length_limit`.
475 pub fn new() -> MachBuffer<I> {
476 MachBuffer {
477 data: SmallVec::new(),
478 min_alignment: I::function_alignment().minimum,
479 relocs: SmallVec::new(),
480 traps: SmallVec::new(),
481 call_sites: SmallVec::new(),
482 patchable_call_sites: SmallVec::new(),
483 exception_handlers: SmallVec::new(),
484 srclocs: SmallVec::new(),
485 debug_tags: vec![],
486 debug_tag_pool: vec![],
487 user_stack_maps: SmallVec::new(),
488 unwind_info: SmallVec::new(),
489 cur_srcloc: None,
490 label_offsets: SmallVec::new(),
491 label_aliases: SmallVec::new(),
492 pending_constants: SmallVec::new(),
493 pending_constants_size: 0,
494 pending_traps: SmallVec::new(),
495 pending_fixup_records: SmallVec::new(),
496 pending_fixup_deadline: u32::MAX,
497 fixup_records: Default::default(),
498 latest_branches: SmallVec::new(),
499 labels_at_tail: SmallVec::new(),
500 labels_at_tail_off: 0,
501 constants: Default::default(),
502 used_constants: Default::default(),
503 open_patchable: false,
504 frame_layout: None,
505 }
506 }
507
508 /// Current offset from start of buffer.
509 pub fn cur_offset(&self) -> CodeOffset {
510 self.data.len() as CodeOffset
511 }
512
513 /// Add a byte.
514 pub fn put1(&mut self, value: u8) {
515 self.data.push(value);
516
517 // Post-invariant: conceptual-labels_at_tail contains a complete and
518 // precise list of labels bound at `cur_offset()`. We have advanced
519 // `cur_offset()`, hence if it had been equal to `labels_at_tail_off`
520 // before, it is not anymore (and it cannot become equal, because
521 // `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is
522 // conceptually empty (even though it is only lazily cleared). No labels
523 // can be bound at this new offset (by invariant on `label_offsets`).
524 // Hence the invariant holds.
525 }
526
527 /// Add 2 bytes.
528 pub fn put2(&mut self, value: u16) {
529 let bytes = value.to_le_bytes();
530 self.data.extend_from_slice(&bytes[..]);
531
532 // Post-invariant: as for `put1()`.
533 }
534
535 /// Add 4 bytes.
536 pub fn put4(&mut self, value: u32) {
537 let bytes = value.to_le_bytes();
538 self.data.extend_from_slice(&bytes[..]);
539
540 // Post-invariant: as for `put1()`.
541 }
542
543 /// Add 8 bytes.
544 pub fn put8(&mut self, value: u64) {
545 let bytes = value.to_le_bytes();
546 self.data.extend_from_slice(&bytes[..]);
547
548 // Post-invariant: as for `put1()`.
549 }
550
551 /// Add a slice of bytes.
552 pub fn put_data(&mut self, data: &[u8]) {
553 self.data.extend_from_slice(data);
554
555 // Post-invariant: as for `put1()`.
556 }
557
558 /// Reserve appended space and return a mutable slice referring to it.
559 pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] {
560 let off = self.data.len();
561 let new_len = self.data.len() + len;
562 self.data.resize(new_len, 0);
563 &mut self.data[off..]
564
565 // Post-invariant: as for `put1()`.
566 }
567
568 /// Align up to the given alignment.
569 pub fn align_to(&mut self, align_to: CodeOffset) {
570 trace!("MachBuffer: align to {}", align_to);
571 assert!(
572 align_to.is_power_of_two(),
573 "{align_to} is not a power of two"
574 );
575 while self.cur_offset() & (align_to - 1) != 0 {
576 self.put1(0);
577 }
578
579 // Post-invariant: as for `put1()`.
580 }
581
582 /// Begin a region of patchable code. There is one requirement for the
583 /// code that is emitted: It must not introduce any instructions that
584 /// could be chomped (branches are an example of this). In other words,
585 /// you must not call [`MachBuffer::add_cond_branch`] or
586 /// [`MachBuffer::add_uncond_branch`] between calls to this method and
587 /// [`MachBuffer::end_patchable`].
588 pub fn start_patchable(&mut self) -> OpenPatchRegion {
589 assert!(!self.open_patchable, "Patchable regions may not be nested");
590 self.open_patchable = true;
591 OpenPatchRegion(usize::try_from(self.cur_offset()).unwrap())
592 }
593
594 /// End a region of patchable code, yielding a [`PatchRegion`] value that
595 /// can be consumed later to produce a one-off mutable slice to the
596 /// associated region of the data buffer.
597 pub fn end_patchable(&mut self, open: OpenPatchRegion) -> PatchRegion {
598 // No need to assert the state of `open_patchable` here, as we take
599 // ownership of the only `OpenPatchable` value.
600 self.open_patchable = false;
601 let end = usize::try_from(self.cur_offset()).unwrap();
602 PatchRegion { range: open.0..end }
603 }
604
605 /// Allocate a `Label` to refer to some offset. May not be bound to a fixed
606 /// offset yet.
607 pub fn get_label(&mut self) -> MachLabel {
608 let l = self.label_offsets.len() as u32;
609 self.label_offsets.push(UNKNOWN_LABEL_OFFSET);
610 self.label_aliases.push(UNKNOWN_LABEL);
611 trace!("MachBuffer: new label -> {:?}", MachLabel(l));
612 MachLabel(l)
613
614 // Post-invariant: the only mutation is to add a new label; it has no
615 // bound offset yet, so it trivially satisfies all invariants.
616 }
617
618 /// Reserve the first N MachLabels for blocks.
619 pub fn reserve_labels_for_blocks(&mut self, blocks: usize) {
620 trace!("MachBuffer: first {} labels are for blocks", blocks);
621 debug_assert!(self.label_offsets.is_empty());
622 self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET);
623 self.label_aliases.resize(blocks, UNKNOWN_LABEL);
624
625 // Post-invariant: as for `get_label()`.
626 }
627
628 /// Registers metadata in this `MachBuffer` about the `constants` provided.
629 ///
630 /// This will record the size/alignment of all constants which will prepare
631 /// them for emission later on.
632 pub fn register_constants(&mut self, constants: &VCodeConstants) {
633 for (c, val) in constants.iter() {
634 self.register_constant(&c, val);
635 }
636 }
637
638 /// Similar to [`MachBuffer::register_constants`] but registers a
639 /// single constant metadata. This function is useful in
640 /// situations where not all constants are known at the time of
641 /// emission.
642 pub fn register_constant(&mut self, constant: &VCodeConstant, data: &VCodeConstantData) {
643 let c2 = self.constants.push(MachBufferConstant {
644 upcoming_label: None,
645 align: data.alignment(),
646 size: data.as_slice().len(),
647 });
648 assert_eq!(*constant, c2);
649 }
650
651 /// Completes constant emission by iterating over `self.used_constants` and
652 /// filling in the "holes" with the constant values provided by `constants`.
653 ///
654 /// Returns the alignment required for this entire buffer. Alignment starts
655 /// at the ISA's minimum function alignment and can be increased due to
656 /// constant requirements.
657 fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 {
658 let mut alignment = self.min_alignment;
659 for (constant, offset) in mem::take(&mut self.used_constants) {
660 let constant = constants.get(constant);
661 let data = constant.as_slice();
662 self.data[offset as usize..][..data.len()].copy_from_slice(data);
663 alignment = constant.alignment().max(alignment);
664 }
665 alignment
666 }
667
668 /// Returns a label that can be used to refer to the `constant` provided.
669 ///
670 /// This will automatically defer a new constant to be emitted for
671 /// `constant` if it has not been previously emitted. Note that this
672 /// function may return a different label for the same constant at
673 /// different points in time. The label is valid to use only from the
674 /// current location; the MachBuffer takes care to emit the same constant
675 /// multiple times if needed so the constant is always in range.
676 pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel {
677 let MachBufferConstant {
678 align,
679 size,
680 upcoming_label,
681 } = self.constants[constant];
682 if let Some(label) = upcoming_label {
683 return label;
684 }
685
686 let label = self.get_label();
687 trace!(
688 "defer constant: eventually emit {size} bytes aligned \
689 to {align} at label {label:?}",
690 );
691 self.pending_constants.push(constant);
692 self.pending_constants_size += size as u32;
693 self.constants[constant].upcoming_label = Some(label);
694 label
695 }
696
697 /// Bind a label to the current offset. A label can only be bound once.
698 pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) {
699 trace!(
700 "MachBuffer: bind label {:?} at offset {}",
701 label,
702 self.cur_offset()
703 );
704 debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET);
705 debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL);
706 let offset = self.cur_offset();
707 self.label_offsets[label.0 as usize] = offset;
708 self.lazily_clear_labels_at_tail();
709 self.labels_at_tail.push(label);
710
711 // Invariants hold: bound offset of label is <= cur_offset (in fact it
712 // is equal). If the `labels_at_tail` list was complete and precise
713 // before, it is still, because we have bound this label to the current
714 // offset and added it to the list (which contains all labels at the
715 // current offset).
716
717 self.optimize_branches(ctrl_plane);
718
719 // Post-invariant: by `optimize_branches()` (see argument there).
720 }
721
722 /// Lazily clear `labels_at_tail` if the tail offset has moved beyond the
723 /// offset that it applies to.
724 fn lazily_clear_labels_at_tail(&mut self) {
725 let offset = self.cur_offset();
726 if offset > self.labels_at_tail_off {
727 self.labels_at_tail_off = offset;
728 self.labels_at_tail.clear();
729 }
730
731 // Post-invariant: either labels_at_tail_off was at cur_offset, and
732 // state is untouched, or was less than cur_offset, in which case the
733 // labels_at_tail list was conceptually empty, and is now actually
734 // empty.
735 }
736
737 /// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`.
738 pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset {
739 let mut iters = 0;
740 while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL {
741 label = self.label_aliases[label.0 as usize];
742 // To protect against an infinite loop (despite our assurances to
743 // ourselves that the invariants make this impossible), assert out
744 // after 1M iterations. The number of basic blocks is limited
745 // in most contexts anyway so this should be impossible to hit with
746 // a legitimate input.
747 iters += 1;
748 assert!(iters < 1_000_000, "Unexpected cycle in label aliases");
749 }
750 self.label_offsets[label.0 as usize]
751
752 // Post-invariant: no mutations.
753 }
754
755 /// Emit a reference to the given label with the given reference type (i.e.,
756 /// branch-instruction format) at the current offset. This is like a
757 /// relocation, but handled internally.
758 ///
759 /// This can be called before the branch is actually emitted; fixups will
760 /// not happen until an island is emitted or the buffer is finished.
761 pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) {
762 trace!(
763 "MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}",
764 offset, label, kind
765 );
766
767 // Add the fixup, and update the worst-case island size based on a
768 // veneer for this label use.
769 let fixup = MachLabelFixup {
770 label,
771 offset,
772 kind,
773 };
774 self.pending_fixup_deadline = self.pending_fixup_deadline.min(fixup.deadline());
775 self.pending_fixup_records.push(fixup);
776
777 // Post-invariant: no mutations to branches/labels data structures.
778 }
779
780 /// Inform the buffer of an unconditional branch at the given offset,
781 /// targeting the given label. May be used to optimize branches.
782 /// The last added label-use must correspond to this branch.
783 /// This must be called when the current offset is equal to `start`; i.e.,
784 /// before actually emitting the branch. This implies that for a branch that
785 /// uses a label and is eligible for optimizations by the MachBuffer, the
786 /// proper sequence is:
787 ///
788 /// - Call `use_label_at_offset()` to emit the fixup record.
789 /// - Call `add_uncond_branch()` to make note of the branch.
790 /// - Emit the bytes for the branch's machine code.
791 ///
792 /// Additional requirement: no labels may be bound between `start` and `end`
793 /// (exclusive on both ends).
794 pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) {
795 debug_assert!(
796 !self.open_patchable,
797 "Branch instruction inserted within a patchable region"
798 );
799 assert!(self.cur_offset() == start);
800 debug_assert!(end > start);
801 assert!(!self.pending_fixup_records.is_empty());
802 let fixup = self.pending_fixup_records.len() - 1;
803 self.lazily_clear_labels_at_tail();
804 self.latest_branches.push(MachBranch {
805 start,
806 end,
807 target,
808 fixup,
809 inverted: None,
810 labels_at_this_branch: self.labels_at_tail.clone(),
811 });
812
813 // Post-invariant: we asserted branch start is current tail; the list of
814 // labels at branch is cloned from list of labels at current tail.
815 }
816
817 /// Inform the buffer of a conditional branch at the given offset,
818 /// targeting the given label. May be used to optimize branches.
819 /// The last added label-use must correspond to this branch.
820 ///
821 /// Additional requirement: no labels may be bound between `start` and `end`
822 /// (exclusive on both ends).
823 pub fn add_cond_branch(
824 &mut self,
825 start: CodeOffset,
826 end: CodeOffset,
827 target: MachLabel,
828 inverted: &[u8],
829 ) {
830 debug_assert!(
831 !self.open_patchable,
832 "Branch instruction inserted within a patchable region"
833 );
834 assert!(self.cur_offset() == start);
835 debug_assert!(end > start);
836 assert!(!self.pending_fixup_records.is_empty());
837 debug_assert!(
838 inverted.len() == (end - start) as usize,
839 "branch length = {}, but inverted length = {}",
840 end - start,
841 inverted.len()
842 );
843 let fixup = self.pending_fixup_records.len() - 1;
844 let inverted = Some(SmallVec::from(inverted));
845 self.lazily_clear_labels_at_tail();
846 self.latest_branches.push(MachBranch {
847 start,
848 end,
849 target,
850 fixup,
851 inverted,
852 labels_at_this_branch: self.labels_at_tail.clone(),
853 });
854
855 // Post-invariant: we asserted branch start is current tail; labels at
856 // branch list is cloned from list of labels at current tail.
857 }
858
859 fn truncate_last_branch(&mut self) {
860 debug_assert!(
861 !self.open_patchable,
862 "Branch instruction truncated within a patchable region"
863 );
864
865 self.lazily_clear_labels_at_tail();
866 // Invariants hold at this point.
867
868 let b = self.latest_branches.pop().unwrap();
869 assert!(b.end == self.cur_offset());
870
871 // State:
872 // [PRE CODE]
873 // Offset b.start, b.labels_at_this_branch:
874 // [BRANCH CODE]
875 // cur_off, self.labels_at_tail -->
876 // (end of buffer)
877 self.data.truncate(b.start as usize);
878 self.pending_fixup_records.truncate(b.fixup);
879
880 // Trim srclocs and debug tags now past the end of the buffer.
881 while let Some(last_srcloc) = self.srclocs.last_mut() {
882 if last_srcloc.end <= b.start {
883 break;
884 }
885 if last_srcloc.start < b.start {
886 last_srcloc.end = b.start;
887 break;
888 }
889 self.srclocs.pop();
890 }
891 while let Some(last_debug_tag) = self.debug_tags.last() {
892 if last_debug_tag.offset <= b.start {
893 break;
894 }
895 self.debug_tags.pop();
896 }
897
898 // State:
899 // [PRE CODE]
900 // cur_off, Offset b.start, b.labels_at_this_branch:
901 // (end of buffer)
902 //
903 // self.labels_at_tail --> (past end of buffer)
904 let cur_off = self.cur_offset();
905 self.labels_at_tail_off = cur_off;
906 // State:
907 // [PRE CODE]
908 // cur_off, Offset b.start, b.labels_at_this_branch,
909 // self.labels_at_tail:
910 // (end of buffer)
911 //
912 // resolve_label_offset(l) for l in labels_at_tail:
913 // (past end of buffer)
914
915 trace!(
916 "truncate_last_branch: truncated {:?}; off now {}",
917 b, cur_off
918 );
919
920 // Fix up resolved label offsets for labels at tail.
921 for &l in &self.labels_at_tail {
922 self.label_offsets[l.0 as usize] = cur_off;
923 }
924 // Old labels_at_this_branch are now at cur_off.
925 self.labels_at_tail.extend(b.labels_at_this_branch);
926
927 // Post-invariant: this operation is defined to truncate the buffer,
928 // which moves cur_off backward, and to move labels at the end of the
929 // buffer back to the start-of-branch offset.
930 //
931 // latest_branches satisfies all invariants:
932 // - it has no branches past the end of the buffer (branches are in
933 // order, we removed the last one, and we truncated the buffer to just
934 // before the start of that branch)
935 // - no labels were moved to lower offsets than the (new) cur_off, so
936 // the labels_at_this_branch list for any other branch need not change.
937 //
938 // labels_at_tail satisfies all invariants:
939 // - all labels that were at the tail after the truncated branch are
940 // moved backward to just before the branch, which becomes the new tail;
941 // thus every element in the list should remain (ensured by `.extend()`
942 // above).
943 // - all labels that refer to the new tail, which is the start-offset of
944 // the truncated branch, must be present. The `labels_at_this_branch`
945 // list in the truncated branch's record is a complete and precise list
946 // of exactly these labels; we append these to labels_at_tail.
947 // - labels_at_tail_off is at cur_off after truncation occurs, so the
948 // list is valid (not to be lazily cleared).
949 //
950 // The stated operation was performed:
951 // - For each label at the end of the buffer prior to this method, it
952 // now resolves to the new (truncated) end of the buffer: it must have
953 // been in `labels_at_tail` (this list is precise and complete, and
954 // the tail was at the end of the truncated branch on entry), and we
955 // iterate over this list and set `label_offsets` to the new tail.
956 // None of these labels could have been an alias (by invariant), so
957 // `label_offsets` is authoritative for each.
958 // - No other labels will be past the end of the buffer, because of the
959 // requirement that no labels be bound to the middle of branch ranges
960 // (see comments to `add_{cond,uncond}_branch()`).
961 // - The buffer is truncated to just before the last branch, and the
962 // fixup record referring to that last branch is removed.
963 }
964
965 /// Performs various optimizations on branches pointing at the current label.
966 pub fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) {
967 if ctrl_plane.get_decision() {
968 return;
969 }
970
971 self.lazily_clear_labels_at_tail();
972 // Invariants valid at this point.
973
974 trace!(
975 "enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
976 self.latest_branches, self.labels_at_tail, self.pending_fixup_records
977 );
978
979 // We continue to munch on branches at the tail of the buffer until no
980 // more rules apply. Note that the loop only continues if a branch is
981 // actually truncated (or if labels are redirected away from a branch),
982 // so this always makes progress.
983 while let Some(b) = self.latest_branches.last() {
984 let cur_off = self.cur_offset();
985 trace!("optimize_branches: last branch {:?} at off {}", b, cur_off);
986 // If there has been any code emission since the end of the last branch or
987 // label definition, then there's nothing we can edit (because we
988 // don't move code once placed, only back up and overwrite), so
989 // clear the records and finish.
990 if b.end < cur_off {
991 break;
992 }
993
994 // If the "labels at this branch" list on this branch is
995 // longer than a threshold, don't do any simplification,
996 // and let the branch remain to separate those labels from
997 // the current tail. This avoids quadratic behavior (see
998 // #3468): otherwise, if a long string of "goto next;
999 // next:" patterns are emitted, all of the labels will
1000 // coalesce into a long list of aliases for the current
1001 // buffer tail. We must track all aliases of the current
1002 // tail for correctness, but we are also allowed to skip
1003 // optimization (removal) of any branch, so we take the
1004 // escape hatch here and let it stand. In effect this
1005 // "spreads" the many thousands of labels in the
1006 // pathological case among an actual (harmless but
1007 // suboptimal) instruction once per N labels.
1008 if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD {
1009 break;
1010 }
1011
1012 // Invariant: we are looking at a branch that ends at the tail of
1013 // the buffer.
1014
1015 // For any branch, conditional or unconditional:
1016 // - If the target is a label at the current offset, then remove
1017 // the conditional branch, and reset all labels that targeted
1018 // the current offset (end of branch) to the truncated
1019 // end-of-code.
1020 //
1021 // Preserves execution semantics: a branch to its own fallthrough
1022 // address is equivalent to a no-op; in both cases, nextPC is the
1023 // fallthrough.
1024 if self.resolve_label_offset(b.target) == cur_off {
1025 trace!("branch with target == cur off; truncating");
1026 self.truncate_last_branch();
1027 continue;
1028 }
1029
1030 // If latest is an unconditional branch:
1031 //
1032 // - If the branch's target is not its own start address, then for
1033 // each label at the start of branch, make the label an alias of the
1034 // branch target, and remove the label from the "labels at this
1035 // branch" list.
1036 //
1037 // - Preserves execution semantics: an unconditional branch's
1038 // only effect is to set PC to a new PC; this change simply
1039 // collapses one step in the step-semantics.
1040 //
1041 // - Post-invariant: the labels that were bound to the start of
1042 // this branch become aliases, so they must not be present in any
1043 // labels-at-this-branch list or the labels-at-tail list. The
1044 // labels are removed form the latest-branch record's
1045 // labels-at-this-branch list, and are never placed in the
1046 // labels-at-tail list. Furthermore, it is correct that they are
1047 // not in either list, because they are now aliases, and labels
1048 // that are aliases remain aliases forever.
1049 //
1050 // - If there is a prior unconditional branch that ends just before
1051 // this one begins, and this branch has no labels bound to its
1052 // start, then we can truncate this branch, because it is entirely
1053 // unreachable (we have redirected all labels that make it
1054 // reachable otherwise). Do so and continue around the loop.
1055 //
1056 // - Preserves execution semantics: the branch is unreachable,
1057 // because execution can only flow into an instruction from the
1058 // prior instruction's fallthrough or from a branch bound to that
1059 // instruction's start offset. Unconditional branches have no
1060 // fallthrough, so if the prior instruction is an unconditional
1061 // branch, no fallthrough entry can happen. The
1062 // labels-at-this-branch list is complete (by invariant), so if it
1063 // is empty, then the instruction is entirely unreachable. Thus,
1064 // it can be removed.
1065 //
1066 // - Post-invariant: ensured by truncate_last_branch().
1067 //
1068 // - If there is a prior conditional branch whose target label
1069 // resolves to the current offset (branches around the
1070 // unconditional branch), then remove the unconditional branch,
1071 // and make the target of the unconditional the target of the
1072 // conditional instead.
1073 //
1074 // - Preserves execution semantics: previously we had:
1075 //
1076 // L1:
1077 // cond_br L2
1078 // br L3
1079 // L2:
1080 // (end of buffer)
1081 //
1082 // by removing the last branch, we have:
1083 //
1084 // L1:
1085 // cond_br L2
1086 // L2:
1087 // (end of buffer)
1088 //
1089 // we then fix up the records for the conditional branch to
1090 // have:
1091 //
1092 // L1:
1093 // cond_br.inverted L3
1094 // L2:
1095 //
1096 // In the original code, control flow reaches L2 when the
1097 // conditional branch's predicate is true, and L3 otherwise. In
1098 // the optimized code, the same is true.
1099 //
1100 // - Post-invariant: all edits to latest_branches and
1101 // labels_at_tail are performed by `truncate_last_branch()`,
1102 // which maintains the invariants at each step.
1103
1104 if b.is_uncond() {
1105 // Set any label equal to current branch's start as an alias of
1106 // the branch's target, if the target is not the branch itself
1107 // (i.e., an infinite loop).
1108 //
1109 // We cannot perform this aliasing if the target of this branch
1110 // ultimately aliases back here; if so, we need to keep this
1111 // branch, so break out of this loop entirely (and clear the
1112 // latest-branches list below).
1113 //
1114 // Note that this check is what prevents cycles from forming in
1115 // `self.label_aliases`. To see why, consider an arbitrary start
1116 // state:
1117 //
1118 // label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to
1119 // Ln, which is not aliased.
1120 //
1121 // We would create a cycle if we assigned label_aliases[Ln]
1122 // = L1. Note that the below assignment is the only write
1123 // to label_aliases.
1124 //
1125 // By our other invariants, we have that Ln (`l` below)
1126 // resolves to the offset `b.start`, because it is in the
1127 // set `b.labels_at_this_branch`.
1128 //
1129 // If L1 were already aliased, through some arbitrarily deep
1130 // chain, to Ln, then it must also resolve to this offset
1131 // `b.start`.
1132 //
1133 // By checking the resolution of `L1` against this offset,
1134 // and aborting this branch-simplification if they are
1135 // equal, we prevent the below assignment from ever creating
1136 // a cycle.
1137 if self.resolve_label_offset(b.target) != b.start {
1138 let redirected = b.labels_at_this_branch.len();
1139 for &l in &b.labels_at_this_branch {
1140 trace!(
1141 " -> label at start of branch {:?} redirected to target {:?}",
1142 l, b.target
1143 );
1144 self.label_aliases[l.0 as usize] = b.target;
1145 // NOTE: we continue to ensure the invariant that labels
1146 // pointing to tail of buffer are in `labels_at_tail`
1147 // because we already ensured above that the last branch
1148 // cannot have a target of `cur_off`; so we never have
1149 // to put the label into `labels_at_tail` when moving it
1150 // here.
1151 }
1152 // Maintain invariant: all branches have been redirected
1153 // and are no longer pointing at the start of this branch.
1154 let mut_b = self.latest_branches.last_mut().unwrap();
1155 mut_b.labels_at_this_branch.clear();
1156
1157 if redirected > 0 {
1158 trace!(" -> after label redirects, restarting loop");
1159 continue;
1160 }
1161 } else {
1162 break;
1163 }
1164
1165 let b = self.latest_branches.last().unwrap();
1166
1167 // Examine any immediately preceding branch.
1168 if self.latest_branches.len() > 1 {
1169 let prev_b = &self.latest_branches[self.latest_branches.len() - 2];
1170 trace!(" -> more than one branch; prev_b = {:?}", prev_b);
1171 // This uncond is immediately after another uncond; we
1172 // should have already redirected labels to this uncond away
1173 // (but check to be sure); so we can truncate this uncond.
1174 if prev_b.is_uncond()
1175 && prev_b.end == b.start
1176 && b.labels_at_this_branch.is_empty()
1177 {
1178 trace!(" -> uncond follows another uncond; truncating");
1179 self.truncate_last_branch();
1180 continue;
1181 }
1182
1183 // This uncond is immediately after a conditional, and the
1184 // conditional's target is the end of this uncond, and we've
1185 // already redirected labels to this uncond away; so we can
1186 // truncate this uncond, flip the sense of the conditional, and
1187 // set the conditional's target (in `latest_branches` and in
1188 // `fixup_records`) to the uncond's target.
1189 if prev_b.is_cond()
1190 && prev_b.end == b.start
1191 && self.resolve_label_offset(prev_b.target) == cur_off
1192 {
1193 trace!(
1194 " -> uncond follows a conditional, and conditional's target resolves to current offset"
1195 );
1196 // Save the target of the uncond (this becomes the
1197 // target of the cond), and truncate the uncond.
1198 let target = b.target;
1199 let data = prev_b.inverted.clone().unwrap();
1200 self.truncate_last_branch();
1201
1202 // Mutate the code and cond branch.
1203 let off_before_edit = self.cur_offset();
1204 let prev_b = self.latest_branches.last_mut().unwrap();
1205 let not_inverted = SmallVec::from(
1206 &self.data[(prev_b.start as usize)..(prev_b.end as usize)],
1207 );
1208
1209 // Low-level edit: replaces bytes of branch with
1210 // inverted form. cur_off remains the same afterward, so
1211 // we do not need to modify label data structures.
1212 self.data.truncate(prev_b.start as usize);
1213 self.data.extend_from_slice(&data[..]);
1214
1215 // Save the original code as the inversion of the
1216 // inverted branch, in case we later edit this branch
1217 // again.
1218 prev_b.inverted = Some(not_inverted);
1219 self.pending_fixup_records[prev_b.fixup].label = target;
1220 trace!(" -> reassigning target of condbr to {:?}", target);
1221 prev_b.target = target;
1222 debug_assert_eq!(off_before_edit, self.cur_offset());
1223 continue;
1224 }
1225 }
1226 }
1227
1228 // If we couldn't do anything with the last branch, then break.
1229 break;
1230 }
1231
1232 self.purge_latest_branches();
1233
1234 trace!(
1235 "leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
1236 self.latest_branches, self.labels_at_tail, self.pending_fixup_records
1237 );
1238 }
1239
1240 fn purge_latest_branches(&mut self) {
1241 // All of our branch simplification rules work only if a branch ends at
1242 // the tail of the buffer, with no following code; and branches are in
1243 // order in latest_branches; so if the last entry ends prior to
1244 // cur_offset, then clear all entries.
1245 let cur_off = self.cur_offset();
1246 if let Some(l) = self.latest_branches.last() {
1247 if l.end < cur_off {
1248 trace!("purge_latest_branches: removing branch {:?}", l);
1249 self.latest_branches.clear();
1250 }
1251 }
1252
1253 // Post-invariant: no invariant requires any branch to appear in
1254 // `latest_branches`; it is always optional. The list-clear above thus
1255 // preserves all semantics.
1256 }
1257
1258 /// Emit a trap at some point in the future with the specified code and
1259 /// stack map.
1260 ///
1261 /// This function returns a [`MachLabel`] which will be the future address
1262 /// of the trap. Jumps should refer to this label, likely by using the
1263 /// [`MachBuffer::use_label_at_offset`] method, to get a relocation
1264 /// patched in once the address of the trap is known.
1265 ///
1266 /// This will batch all traps into the end of the function.
1267 pub fn defer_trap(&mut self, code: TrapCode) -> MachLabel {
1268 let label = self.get_label();
1269 self.pending_traps.push(MachLabelTrap {
1270 label,
1271 code,
1272 loc: self.cur_srcloc.map(|(_start, loc)| loc),
1273 });
1274 label
1275 }
1276
1277 /// Is an island needed within the next N bytes?
1278 pub fn island_needed(&self, distance: CodeOffset) -> bool {
1279 let deadline = match self.fixup_records.peek() {
1280 Some(fixup) => fixup.deadline().min(self.pending_fixup_deadline),
1281 None => self.pending_fixup_deadline,
1282 };
1283 deadline < u32::MAX && self.worst_case_end_of_island(distance) > deadline
1284 }
1285
1286 /// Returns the maximal offset that islands can reach if `distance` more
1287 /// bytes are appended.
1288 ///
1289 /// This is used to determine if veneers need insertions since jumps that
1290 /// can't reach past this point must get a veneer of some form.
1291 fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset {
1292 // Assume that all fixups will require veneers and that the veneers are
1293 // the worst-case size for each platform. This is an over-generalization
1294 // to avoid iterating over the `fixup_records` list or maintaining
1295 // information about it as we go along.
1296 let island_worst_case_size = ((self.fixup_records.len() + self.pending_fixup_records.len())
1297 as u32)
1298 * (I::LabelUse::worst_case_veneer_size())
1299 + self.pending_constants_size
1300 + (self.pending_traps.len() * I::TRAP_OPCODE.len()) as u32;
1301 self.cur_offset()
1302 .saturating_add(distance)
1303 .saturating_add(island_worst_case_size)
1304 }
1305
1306 /// Emit all pending constants and required pending veneers.
1307 ///
1308 /// Should only be called if `island_needed()` returns true, i.e., if we
1309 /// actually reach a deadline. It's not necessarily a problem to do so
1310 /// otherwise but it may result in unnecessary work during emission.
1311 pub fn emit_island(&mut self, distance: CodeOffset, ctrl_plane: &mut ControlPlane) {
1312 self.emit_island_maybe_forced(ForceVeneers::No, distance, ctrl_plane);
1313 }
1314
1315 /// Same as `emit_island`, but an internal API with a `force_veneers`
1316 /// argument to force all veneers to always get emitted for debugging.
1317 fn emit_island_maybe_forced(
1318 &mut self,
1319 force_veneers: ForceVeneers,
1320 distance: CodeOffset,
1321 ctrl_plane: &mut ControlPlane,
1322 ) {
1323 // We're going to purge fixups, so no latest-branch editing can happen
1324 // anymore.
1325 self.latest_branches.clear();
1326
1327 // End the current location tracking since anything emitted during this
1328 // function shouldn't be attributed to whatever the current source
1329 // location is.
1330 //
1331 // Note that the current source location, if it's set right now, will be
1332 // restored at the end of this island emission.
1333 let cur_loc = self.cur_srcloc.map(|(_, loc)| loc);
1334 if cur_loc.is_some() {
1335 self.end_srcloc();
1336 }
1337
1338 let forced_threshold = self.worst_case_end_of_island(distance);
1339
1340 // First flush out all traps/constants so we have more labels in case
1341 // fixups are applied against these labels.
1342 //
1343 // Note that traps are placed first since this typically happens at the
1344 // end of the function and for disassemblers we try to keep all the code
1345 // contiguously together.
1346 for MachLabelTrap { label, code, loc } in mem::take(&mut self.pending_traps) {
1347 // If this trap has source information associated with it then
1348 // emit this information for the trap instruction going out now too.
1349 if let Some(loc) = loc {
1350 self.start_srcloc(loc);
1351 }
1352 self.align_to(I::LabelUse::ALIGN);
1353 self.bind_label(label, ctrl_plane);
1354 self.add_trap(code);
1355 self.put_data(I::TRAP_OPCODE);
1356 if loc.is_some() {
1357 self.end_srcloc();
1358 }
1359 }
1360
1361 for constant in mem::take(&mut self.pending_constants) {
1362 let MachBufferConstant { align, size, .. } = self.constants[constant];
1363 let label = self.constants[constant].upcoming_label.take().unwrap();
1364 self.align_to(align);
1365 self.bind_label(label, ctrl_plane);
1366 self.used_constants.push((constant, self.cur_offset()));
1367 self.get_appended_space(size);
1368 }
1369
1370 // Either handle all pending fixups because they're ready or move them
1371 // onto the `BinaryHeap` tracking all pending fixups if they aren't
1372 // ready.
1373 assert!(self.latest_branches.is_empty());
1374 for fixup in mem::take(&mut self.pending_fixup_records) {
1375 if self.should_apply_fixup(&fixup, forced_threshold) {
1376 self.handle_fixup(fixup, force_veneers, forced_threshold);
1377 } else {
1378 self.fixup_records.push(fixup);
1379 }
1380 }
1381 self.pending_fixup_deadline = u32::MAX;
1382 while let Some(fixup) = self.fixup_records.peek() {
1383 trace!("emit_island: fixup {:?}", fixup);
1384
1385 // If this fixup shouldn't be applied, that means its label isn't
1386 // defined yet and there'll be remaining space to apply a veneer if
1387 // necessary in the future after this island. In that situation
1388 // because `fixup_records` is sorted by deadline this loop can
1389 // exit.
1390 if !self.should_apply_fixup(fixup, forced_threshold) {
1391 break;
1392 }
1393
1394 let fixup = self.fixup_records.pop().unwrap();
1395 self.handle_fixup(fixup, force_veneers, forced_threshold);
1396 }
1397
1398 if let Some(loc) = cur_loc {
1399 self.start_srcloc(loc);
1400 }
1401 }
1402
1403 fn should_apply_fixup(&self, fixup: &MachLabelFixup<I>, forced_threshold: CodeOffset) -> bool {
1404 let label_offset = self.resolve_label_offset(fixup.label);
1405 label_offset != UNKNOWN_LABEL_OFFSET || fixup.deadline() < forced_threshold
1406 }
1407
1408 fn handle_fixup(
1409 &mut self,
1410 fixup: MachLabelFixup<I>,
1411 force_veneers: ForceVeneers,
1412 forced_threshold: CodeOffset,
1413 ) {
1414 let MachLabelFixup {
1415 label,
1416 offset,
1417 kind,
1418 } = fixup;
1419 let start = offset as usize;
1420 let end = (offset + kind.patch_size()) as usize;
1421 let label_offset = self.resolve_label_offset(label);
1422
1423 if label_offset != UNKNOWN_LABEL_OFFSET {
1424 // If the offset of the label for this fixup is known then
1425 // we're going to do something here-and-now. We're either going
1426 // to patch the original offset because it's an in-bounds jump,
1427 // or we're going to generate a veneer, patch the fixup to jump
1428 // to the veneer, and then keep going.
1429 //
1430 // If the label comes after the original fixup, then we should
1431 // be guaranteed that the jump is in-bounds. Otherwise there's
1432 // a bug somewhere because this method wasn't called soon
1433 // enough. All forward-jumps are tracked and should get veneers
1434 // before their deadline comes and they're unable to jump
1435 // further.
1436 //
1437 // Otherwise if the label is before the fixup, then that's a
1438 // backwards jump. If it's past the maximum negative range
1439 // then we'll emit a veneer that to jump forward to which can
1440 // then jump backwards.
1441 let veneer_required = if label_offset >= offset {
1442 assert!((label_offset - offset) <= kind.max_pos_range());
1443 false
1444 } else {
1445 (offset - label_offset) > kind.max_neg_range()
1446 };
1447 trace!(
1448 " -> label_offset = {}, known, required = {} (pos {} neg {})",
1449 label_offset,
1450 veneer_required,
1451 kind.max_pos_range(),
1452 kind.max_neg_range()
1453 );
1454
1455 if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required {
1456 self.emit_veneer(label, offset, kind);
1457 } else {
1458 let slice = &mut self.data[start..end];
1459 trace!(
1460 "patching in-range! slice = {slice:?}; offset = {offset:#x}; label_offset = {label_offset:#x}"
1461 );
1462 kind.patch(slice, offset, label_offset);
1463 }
1464 } else {
1465 // If the offset of this label is not known at this time then
1466 // that means that a veneer is required because after this
1467 // island the target can't be in range of the original target.
1468 assert!(forced_threshold - offset > kind.max_pos_range());
1469 self.emit_veneer(label, offset, kind);
1470 }
1471 }
1472
1473 /// Emits a "veneer" the `kind` code at `offset` to jump to `label`.
1474 ///
1475 /// This will generate extra machine code, using `kind`, to get a
1476 /// larger-jump-kind than `kind` allows. The code at `offset` is then
1477 /// patched to jump to our new code, and then the new code is enqueued for
1478 /// a fixup to get processed at some later time.
1479 fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) {
1480 // If this `kind` doesn't support a veneer then that's a bug in the
1481 // backend because we need to implement support for such a veneer.
1482 assert!(
1483 kind.supports_veneer(),
1484 "jump beyond the range of {kind:?} but a veneer isn't supported",
1485 );
1486
1487 // Allocate space for a veneer in the island.
1488 self.align_to(I::LabelUse::ALIGN);
1489 let veneer_offset = self.cur_offset();
1490 trace!("making a veneer at {}", veneer_offset);
1491 let start = offset as usize;
1492 let end = (offset + kind.patch_size()) as usize;
1493 let slice = &mut self.data[start..end];
1494 // Patch the original label use to refer to the veneer.
1495 trace!(
1496 "patching original at offset {} to veneer offset {}",
1497 offset, veneer_offset
1498 );
1499 kind.patch(slice, offset, veneer_offset);
1500 // Generate the veneer.
1501 let veneer_slice = self.get_appended_space(kind.veneer_size() as usize);
1502 let (veneer_fixup_off, veneer_label_use) =
1503 kind.generate_veneer(veneer_slice, veneer_offset);
1504 trace!(
1505 "generated veneer; fixup offset {}, label_use {:?}",
1506 veneer_fixup_off, veneer_label_use
1507 );
1508 // Register a new use of `label` with our new veneer fixup and
1509 // offset. This'll recalculate deadlines accordingly and
1510 // enqueue this fixup to get processed at some later
1511 // time.
1512 self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use);
1513 }
1514
1515 fn finish_emission_maybe_forcing_veneers(
1516 &mut self,
1517 force_veneers: ForceVeneers,
1518 ctrl_plane: &mut ControlPlane,
1519 ) {
1520 while !self.pending_constants.is_empty()
1521 || !self.pending_traps.is_empty()
1522 || !self.fixup_records.is_empty()
1523 || !self.pending_fixup_records.is_empty()
1524 {
1525 // `emit_island()` will emit any pending veneers and constants, and
1526 // as a side-effect, will also take care of any fixups with resolved
1527 // labels eagerly.
1528 self.emit_island_maybe_forced(force_veneers, u32::MAX, ctrl_plane);
1529 }
1530
1531 // Ensure that all labels have been fixed up after the last island is emitted. This is a
1532 // full (release-mode) assert because an unresolved label means the emitted code is
1533 // incorrect.
1534 assert!(self.fixup_records.is_empty());
1535 assert!(self.pending_fixup_records.is_empty());
1536 }
1537
1538 /// Finish any deferred emissions and/or fixups.
1539 pub fn finish(
1540 mut self,
1541 constants: &VCodeConstants,
1542 ctrl_plane: &mut ControlPlane,
1543 ) -> MachBufferFinalized<Stencil> {
1544 let _tt = timing::vcode_emit_finish();
1545
1546 self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane);
1547
1548 let alignment = self.finish_constants(constants);
1549
1550 // Resolve all labels to their offsets.
1551 let finalized_relocs = self
1552 .relocs
1553 .iter()
1554 .map(|reloc| FinalizedMachReloc {
1555 offset: reloc.offset,
1556 kind: reloc.kind,
1557 addend: reloc.addend,
1558 target: match &reloc.target {
1559 RelocTarget::ExternalName(name) => {
1560 FinalizedRelocTarget::ExternalName(name.clone())
1561 }
1562 RelocTarget::Label(label) => {
1563 FinalizedRelocTarget::Func(self.resolve_label_offset(*label))
1564 }
1565 },
1566 })
1567 .collect();
1568
1569 let finalized_exception_handlers = self
1570 .exception_handlers
1571 .iter()
1572 .map(|handler| handler.finalize(|label| self.resolve_label_offset(label)))
1573 .collect();
1574
1575 let mut srclocs = self.srclocs;
1576 srclocs.sort_by_key(|entry| entry.start);
1577
1578 MachBufferFinalized {
1579 data: self.data,
1580 relocs: finalized_relocs,
1581 traps: self.traps,
1582 call_sites: self.call_sites,
1583 patchable_call_sites: self.patchable_call_sites,
1584 exception_handlers: finalized_exception_handlers,
1585 srclocs,
1586 debug_tags: self.debug_tags,
1587 debug_tag_pool: self.debug_tag_pool,
1588 user_stack_maps: self.user_stack_maps,
1589 unwind_info: self.unwind_info,
1590 alignment,
1591 frame_layout: self.frame_layout,
1592 nop_units: I::gen_nop_units(),
1593 }
1594 }
1595
1596 /// Add an external relocation at the given offset.
1597 pub fn add_reloc_at_offset<T: Into<RelocTarget> + Clone>(
1598 &mut self,
1599 offset: CodeOffset,
1600 kind: Reloc,
1601 target: &T,
1602 addend: Addend,
1603 ) {
1604 let target: RelocTarget = target.clone().into();
1605 // FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally
1606 // generate a label-use statement to track whether an island is possibly
1607 // needed to escape this function to actually get to the external name.
1608 // This is most likely to come up on AArch64 where calls between
1609 // functions use a 26-bit signed offset which gives +/- 64MB. This means
1610 // that if a function is 128MB in size and there's a call in the middle
1611 // it's impossible to reach the actual target. Also, while it's
1612 // technically possible to jump to the start of a function and then jump
1613 // further, island insertion below always inserts islands after
1614 // previously appended code so for Cranelift's own implementation this
1615 // is also a problem for 64MB functions on AArch64 which start with a
1616 // call instruction, those won't be able to escape.
1617 //
1618 // Ideally what needs to happen here is that a `LabelUse` is
1619 // transparently generated (or call-sites of this function are audited
1620 // to generate a `LabelUse` instead) and tracked internally. The actual
1621 // relocation would then change over time if and when a veneer is
1622 // inserted, where the relocation here would be patched by this
1623 // `MachBuffer` to jump to the veneer. The problem, though, is that all
1624 // this still needs to end up, in the case of a singular function,
1625 // generating a final relocation pointing either to this particular
1626 // relocation or to the veneer inserted. Additionally
1627 // `MachBuffer` needs the concept of a label which will never be
1628 // resolved, so `emit_island` doesn't trip over not actually ever
1629 // knowing what some labels are. Currently the loop in
1630 // `finish_emission_maybe_forcing_veneers` would otherwise infinitely
1631 // loop.
1632 //
1633 // For now this means that because relocs aren't tracked at all that
1634 // AArch64 functions have a rough size limits of 64MB. For now that's
1635 // somewhat reasonable and the failure mode is a panic in `MachBuffer`
1636 // when a relocation can't otherwise be resolved later, so it shouldn't
1637 // actually result in any memory unsafety or anything like that.
1638 self.relocs.push(MachReloc {
1639 offset,
1640 kind,
1641 target,
1642 addend,
1643 });
1644 }
1645
1646 /// Add an external relocation at the current offset.
1647 pub fn add_reloc<T: Into<RelocTarget> + Clone>(
1648 &mut self,
1649 kind: Reloc,
1650 target: &T,
1651 addend: Addend,
1652 ) {
1653 self.add_reloc_at_offset(self.data.len() as CodeOffset, kind, target, addend);
1654 }
1655
1656 /// Add a trap record at the current offset.
1657 pub fn add_trap(&mut self, code: TrapCode) {
1658 self.traps.push(MachTrap {
1659 offset: self.data.len() as CodeOffset,
1660 code,
1661 });
1662 }
1663
1664 /// Add a call-site record at the current offset.
1665 pub fn add_call_site(&mut self) {
1666 self.add_try_call_site(None, core::iter::empty());
1667 }
1668
1669 /// Add a call-site record at the current offset with exception
1670 /// handlers.
1671 pub fn add_try_call_site(
1672 &mut self,
1673 frame_offset: Option<u32>,
1674 exception_handlers: impl Iterator<Item = MachExceptionHandler>,
1675 ) {
1676 let start = u32::try_from(self.exception_handlers.len()).unwrap();
1677 self.exception_handlers.extend(exception_handlers);
1678 let end = u32::try_from(self.exception_handlers.len()).unwrap();
1679 let exception_handler_range = start..end;
1680
1681 self.call_sites.push(MachCallSite {
1682 ret_addr: self.data.len() as CodeOffset,
1683 frame_offset,
1684 exception_handler_range,
1685 });
1686 }
1687
1688 /// Add a patchable call record at the current offset The actual
1689 /// call is expected to have been emitted; the VCodeInst trait
1690 /// specifies how to NOP it out, and we carry that information to
1691 /// the finalized Machbuffer.
1692 pub fn add_patchable_call_site(&mut self, len: u32) {
1693 self.patchable_call_sites.push(MachPatchableCallSite {
1694 ret_addr: self.cur_offset(),
1695 len,
1696 });
1697 }
1698
1699 /// Add an unwind record at the current offset.
1700 pub fn add_unwind(&mut self, unwind: UnwindInst) {
1701 self.unwind_info.push((self.cur_offset(), unwind));
1702 }
1703
1704 /// Set the `SourceLoc` for code from this offset until the offset at the
1705 /// next call to `end_srcloc()`.
1706 /// Returns the current [CodeOffset] and [RelSourceLoc].
1707 pub fn start_srcloc(&mut self, loc: RelSourceLoc) -> (CodeOffset, RelSourceLoc) {
1708 let cur = (self.cur_offset(), loc);
1709 self.cur_srcloc = Some(cur);
1710 cur
1711 }
1712
1713 /// Mark the end of the `SourceLoc` segment started at the last
1714 /// `start_srcloc()` call.
1715 pub fn end_srcloc(&mut self) {
1716 let (start, loc) = self
1717 .cur_srcloc
1718 .take()
1719 .expect("end_srcloc() called without start_srcloc()");
1720 let end = self.cur_offset();
1721 // Skip zero-length extends.
1722 debug_assert!(end >= start);
1723 if end > start {
1724 self.srclocs.push(MachSrcLoc { start, end, loc });
1725 }
1726 }
1727
1728 /// Push a user stack map onto this buffer.
1729 ///
1730 /// The stack map is associated with the given `return_addr` code
1731 /// offset. This must be the PC for the instruction just *after* this stack
1732 /// map's associated instruction. For example in the sequence `call $foo;
1733 /// add r8, rax`, the `return_addr` must be the offset of the start of the
1734 /// `add` instruction.
1735 ///
1736 /// Stack maps must be pushed in sorted `return_addr` order.
1737 pub fn push_user_stack_map(
1738 &mut self,
1739 emit_state: &I::State,
1740 return_addr: CodeOffset,
1741 mut stack_map: ir::UserStackMap,
1742 ) {
1743 let span = emit_state.frame_layout().active_size();
1744 trace!("Adding user stack map @ {return_addr:#x} spanning {span} bytes: {stack_map:?}");
1745
1746 debug_assert!(
1747 self.user_stack_maps
1748 .last()
1749 .map_or(true, |(prev_addr, _, _)| *prev_addr < return_addr),
1750 "pushed stack maps out of order: {} is not less than {}",
1751 self.user_stack_maps.last().unwrap().0,
1752 return_addr,
1753 );
1754
1755 stack_map.finalize(emit_state.frame_layout().sp_to_sized_stack_slots());
1756 self.user_stack_maps.push((return_addr, span, stack_map));
1757 }
1758
1759 /// Push a debug tag associated with the current buffer offset.
1760 pub fn push_debug_tags(&mut self, pos: MachDebugTagPos, tags: &[DebugTag]) {
1761 trace!("debug tags at offset {}: {tags:?}", self.cur_offset());
1762 let start = u32::try_from(self.debug_tag_pool.len()).unwrap();
1763 self.debug_tag_pool.extend(tags.iter().cloned());
1764 let end = u32::try_from(self.debug_tag_pool.len()).unwrap();
1765 self.debug_tags.push(MachDebugTags {
1766 offset: self.cur_offset(),
1767 pos,
1768 range: start..end,
1769 });
1770 }
1771
1772 /// Increase the alignment of the buffer to the given alignment if bigger
1773 /// than the current alignment.
1774 pub fn set_log2_min_function_alignment(&mut self, align_to: u8) {
1775 self.min_alignment = self.min_alignment.max(
1776 1u32.checked_shl(u32::from(align_to))
1777 .expect("log2_min_function_alignment too large"),
1778 );
1779 }
1780
1781 /// Set the frame layout metadata.
1782 pub fn set_frame_layout(&mut self, frame_layout: MachBufferFrameLayout) {
1783 debug_assert!(self.frame_layout.is_none());
1784 self.frame_layout = Some(frame_layout);
1785 }
1786}
1787
1788impl<I: VCodeInst> Extend<u8> for MachBuffer<I> {
1789 fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) {
1790 for b in iter {
1791 self.put1(b);
1792 }
1793 }
1794}
1795
1796impl<T: CompilePhase> MachBufferFinalized<T> {
1797 /// Get a list of source location mapping tuples in sorted-by-start-offset order.
1798 pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] {
1799 &self.srclocs[..]
1800 }
1801
1802 /// Get all debug tags, sorted by associated offset.
1803 pub fn debug_tags(&self) -> impl Iterator<Item = MachBufferDebugTagList<'_>> {
1804 self.debug_tags.iter().map(|tags| {
1805 let start = usize::try_from(tags.range.start).unwrap();
1806 let end = usize::try_from(tags.range.end).unwrap();
1807 MachBufferDebugTagList {
1808 offset: tags.offset,
1809 pos: tags.pos,
1810 tags: &self.debug_tag_pool[start..end],
1811 }
1812 })
1813 }
1814
1815 /// Get the total required size for the code.
1816 pub fn total_size(&self) -> CodeOffset {
1817 self.data.len() as CodeOffset
1818 }
1819
1820 /// Return the code in this mach buffer as a hex string for testing purposes.
1821 pub fn stringify_code_bytes(&self) -> String {
1822 // This is pretty lame, but whatever ..
1823 use std::fmt::Write;
1824 let mut s = String::with_capacity(self.data.len() * 2);
1825 for b in &self.data {
1826 write!(&mut s, "{b:02X}").unwrap();
1827 }
1828 s
1829 }
1830
1831 /// Get the code bytes.
1832 pub fn data(&self) -> &[u8] {
1833 // N.B.: we emit every section into the .text section as far as
1834 // the `CodeSink` is concerned; we do not bother to segregate
1835 // the contents into the actual program text, the jumptable and the
1836 // rodata (constant pool). This allows us to generate code assuming
1837 // that these will not be relocated relative to each other, and avoids
1838 // having to designate each section as belonging in one of the three
1839 // fixed categories defined by `CodeSink`. If this becomes a problem
1840 // later (e.g. because of memory permissions or similar), we can
1841 // add this designation and segregate the output; take care, however,
1842 // to add the appropriate relocations in this case.
1843
1844 &self.data[..]
1845 }
1846
1847 /// Get a mutable slice of the code bytes, allowing patching
1848 /// post-passes.
1849 pub fn data_mut(&mut self) -> &mut [u8] {
1850 &mut self.data[..]
1851 }
1852
1853 /// Get the list of external relocations for this code.
1854 pub fn relocs(&self) -> &[FinalizedMachReloc] {
1855 &self.relocs[..]
1856 }
1857
1858 /// Get the list of trap records for this code.
1859 pub fn traps(&self) -> &[MachTrap] {
1860 &self.traps[..]
1861 }
1862
1863 /// Get the user stack map metadata for this code.
1864 pub fn user_stack_maps(&self) -> &[(CodeOffset, u32, ir::UserStackMap)] {
1865 &self.user_stack_maps
1866 }
1867
1868 /// Take this buffer's user strack map metadata.
1869 pub fn take_user_stack_maps(&mut self) -> SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]> {
1870 mem::take(&mut self.user_stack_maps)
1871 }
1872
1873 /// Get the list of call sites for this code, along with
1874 /// associated exception handlers.
1875 ///
1876 /// Each item yielded by the returned iterator is a struct with:
1877 ///
1878 /// - The call site metadata record, with a `ret_addr` field
1879 /// directly accessible and denoting the offset of the return
1880 /// address into this buffer's code.
1881 /// - The slice of pairs of exception tags and code offsets
1882 /// denoting exception-handler entry points associated with this
1883 /// call site.
1884 pub fn call_sites(&self) -> impl Iterator<Item = FinalizedMachCallSite<'_>> + '_ {
1885 self.call_sites.iter().map(|call_site| {
1886 let handler_range = call_site.exception_handler_range.clone();
1887 let handler_range = usize::try_from(handler_range.start).unwrap()
1888 ..usize::try_from(handler_range.end).unwrap();
1889 FinalizedMachCallSite {
1890 ret_addr: call_site.ret_addr,
1891 frame_offset: call_site.frame_offset,
1892 exception_handlers: &self.exception_handlers[handler_range],
1893 }
1894 })
1895 }
1896
1897 /// Get the frame layout, if known.
1898 pub fn frame_layout(&self) -> Option<&MachBufferFrameLayout> {
1899 self.frame_layout.as_ref()
1900 }
1901
1902 /// Get the list of patchable call sites for this code.
1903 ///
1904 /// Each location in the buffer contains the bytes for a call
1905 /// instruction to the specified target. If the call is to be
1906 /// patched out, the bytes in the region should be replaced with
1907 /// those given in the `MachBufferFinalized::nop` array, repeated
1908 /// as many times as necessary. (The length of the patchable
1909 /// region is guaranteed to be an integer multiple of that NOP
1910 /// unit size.)
1911 pub fn patchable_call_sites(&self) -> impl Iterator<Item = &MachPatchableCallSite> + '_ {
1912 self.patchable_call_sites.iter()
1913 }
1914}
1915
1916/// An item in the exception-handler list for a callsite, with label
1917/// references. Items are interpreted in left-to-right order and the
1918/// first match wins.
1919#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1920pub enum MachExceptionHandler {
1921 /// A specific tag (in the current dynamic context) should be
1922 /// handled by the code at the given offset.
1923 Tag(ExceptionTag, MachLabel),
1924 /// All exceptions should be handled by the code at the given
1925 /// offset.
1926 Default(MachLabel),
1927 /// The dynamic context for interpreting tags is updated to the
1928 /// value stored in the given machine location (in this frame's
1929 /// context).
1930 Context(ExceptionContextLoc),
1931}
1932
1933impl MachExceptionHandler {
1934 fn finalize<F: Fn(MachLabel) -> CodeOffset>(self, f: F) -> FinalizedMachExceptionHandler {
1935 match self {
1936 Self::Tag(tag, label) => FinalizedMachExceptionHandler::Tag(tag, f(label)),
1937 Self::Default(label) => FinalizedMachExceptionHandler::Default(f(label)),
1938 Self::Context(loc) => FinalizedMachExceptionHandler::Context(loc),
1939 }
1940 }
1941}
1942
1943/// An item in the exception-handler list for a callsite, with final
1944/// (lowered) code offsets. Items are interpreted in left-to-right
1945/// order and the first match wins.
1946#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1947#[cfg_attr(
1948 feature = "enable-serde",
1949 derive(serde_derive::Serialize, serde_derive::Deserialize)
1950)]
1951pub enum FinalizedMachExceptionHandler {
1952 /// A specific tag (in the current dynamic context) should be
1953 /// handled by the code at the given offset.
1954 Tag(ExceptionTag, CodeOffset),
1955 /// All exceptions should be handled by the code at the given
1956 /// offset.
1957 Default(CodeOffset),
1958 /// The dynamic context for interpreting tags is updated to the
1959 /// value stored in the given machine location (in this frame's
1960 /// context).
1961 Context(ExceptionContextLoc),
1962}
1963
1964/// A location for a dynamic exception context value.
1965#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1966#[cfg_attr(
1967 feature = "enable-serde",
1968 derive(serde_derive::Serialize, serde_derive::Deserialize)
1969)]
1970pub enum ExceptionContextLoc {
1971 /// An offset from SP at the callsite.
1972 SPOffset(u32),
1973 /// A GPR at the callsite. The physical register number for the
1974 /// GPR register file on the target architecture is used.
1975 GPR(u8),
1976}
1977
1978/// Metadata about a constant.
1979struct MachBufferConstant {
1980 /// A label which has not yet been bound which can be used for this
1981 /// constant.
1982 ///
1983 /// This is lazily created when a label is requested for a constant and is
1984 /// cleared when a constant is emitted.
1985 upcoming_label: Option<MachLabel>,
1986 /// Required alignment.
1987 align: CodeOffset,
1988 /// The byte size of this constant.
1989 size: usize,
1990}
1991
1992/// A trap that is deferred to the next time an island is emitted for either
1993/// traps, constants, or fixups.
1994struct MachLabelTrap {
1995 /// This label will refer to the trap's offset.
1996 label: MachLabel,
1997 /// The code associated with this trap.
1998 code: TrapCode,
1999 /// An optional source location to assign for this trap.
2000 loc: Option<RelSourceLoc>,
2001}
2002
2003/// A fixup to perform on the buffer once code is emitted. Fixups always refer
2004/// to labels and patch the code based on label offsets. Hence, they are like
2005/// relocations, but internal to one buffer.
2006#[derive(Debug)]
2007struct MachLabelFixup<I: VCodeInst> {
2008 /// The label whose offset controls this fixup.
2009 label: MachLabel,
2010 /// The offset to fix up / patch to refer to this label.
2011 offset: CodeOffset,
2012 /// The kind of fixup. This is architecture-specific; each architecture may have,
2013 /// e.g., several types of branch instructions, each with differently-sized
2014 /// offset fields and different places within the instruction to place the
2015 /// bits.
2016 kind: I::LabelUse,
2017}
2018
2019impl<I: VCodeInst> MachLabelFixup<I> {
2020 fn deadline(&self) -> CodeOffset {
2021 self.offset.saturating_add(self.kind.max_pos_range())
2022 }
2023}
2024
2025impl<I: VCodeInst> PartialEq for MachLabelFixup<I> {
2026 fn eq(&self, other: &Self) -> bool {
2027 self.deadline() == other.deadline()
2028 }
2029}
2030
2031impl<I: VCodeInst> Eq for MachLabelFixup<I> {}
2032
2033impl<I: VCodeInst> PartialOrd for MachLabelFixup<I> {
2034 fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
2035 Some(self.cmp(other))
2036 }
2037}
2038
2039impl<I: VCodeInst> Ord for MachLabelFixup<I> {
2040 fn cmp(&self, other: &Self) -> Ordering {
2041 other.deadline().cmp(&self.deadline())
2042 }
2043}
2044
2045/// A relocation resulting from a compilation.
2046#[derive(Clone, Debug, PartialEq)]
2047#[cfg_attr(
2048 feature = "enable-serde",
2049 derive(serde_derive::Serialize, serde_derive::Deserialize)
2050)]
2051pub struct MachRelocBase<T> {
2052 /// The offset at which the relocation applies, *relative to the
2053 /// containing section*.
2054 pub offset: CodeOffset,
2055 /// The kind of relocation.
2056 pub kind: Reloc,
2057 /// The external symbol / name to which this relocation refers.
2058 pub target: T,
2059 /// The addend to add to the symbol value.
2060 pub addend: i64,
2061}
2062
2063type MachReloc = MachRelocBase<RelocTarget>;
2064
2065/// A relocation resulting from a compilation.
2066pub type FinalizedMachReloc = MachRelocBase<FinalizedRelocTarget>;
2067
2068/// A Relocation target
2069#[derive(Debug, Clone, PartialEq, Eq, Hash)]
2070pub enum RelocTarget {
2071 /// Points to an [ExternalName] outside the current function.
2072 ExternalName(ExternalName),
2073 /// Points to a [MachLabel] inside this function.
2074 /// This is different from [MachLabelFixup] in that both the relocation and the
2075 /// label will be emitted and are only resolved at link time.
2076 ///
2077 /// There is no reason to prefer this over [MachLabelFixup] unless the ABI requires it.
2078 Label(MachLabel),
2079}
2080
2081impl From<ExternalName> for RelocTarget {
2082 fn from(name: ExternalName) -> Self {
2083 Self::ExternalName(name)
2084 }
2085}
2086
2087impl From<MachLabel> for RelocTarget {
2088 fn from(label: MachLabel) -> Self {
2089 Self::Label(label)
2090 }
2091}
2092
2093/// A Relocation target
2094#[derive(Debug, Clone, PartialEq, Eq, Hash)]
2095#[cfg_attr(
2096 feature = "enable-serde",
2097 derive(serde_derive::Serialize, serde_derive::Deserialize)
2098)]
2099pub enum FinalizedRelocTarget {
2100 /// Points to an [ExternalName] outside the current function.
2101 ExternalName(ExternalName),
2102 /// Points to a [CodeOffset] from the start of the current function.
2103 Func(CodeOffset),
2104}
2105
2106impl FinalizedRelocTarget {
2107 /// Returns a display for the current [FinalizedRelocTarget], with extra context to prettify the
2108 /// output.
2109 pub fn display<'a>(&'a self, params: Option<&'a FunctionParameters>) -> String {
2110 match self {
2111 FinalizedRelocTarget::ExternalName(name) => format!("{}", name.display(params)),
2112 FinalizedRelocTarget::Func(offset) => format!("func+{offset}"),
2113 }
2114 }
2115}
2116
2117/// A trap record resulting from a compilation.
2118#[derive(Clone, Debug, PartialEq)]
2119#[cfg_attr(
2120 feature = "enable-serde",
2121 derive(serde_derive::Serialize, serde_derive::Deserialize)
2122)]
2123pub struct MachTrap {
2124 /// The offset at which the trap instruction occurs, *relative to the
2125 /// containing section*.
2126 pub offset: CodeOffset,
2127 /// The trap code.
2128 pub code: TrapCode,
2129}
2130
2131/// A call site record resulting from a compilation.
2132#[derive(Clone, Debug, PartialEq)]
2133#[cfg_attr(
2134 feature = "enable-serde",
2135 derive(serde_derive::Serialize, serde_derive::Deserialize)
2136)]
2137pub struct MachCallSite {
2138 /// The offset of the call's return address, *relative to the
2139 /// start of the buffer*.
2140 pub ret_addr: CodeOffset,
2141
2142 /// The offset from the FP at this callsite down to the SP when
2143 /// the call occurs, if known. In other words, the size of the
2144 /// stack frame up to the saved FP slot. Useful to recover the
2145 /// start of the stack frame and to look up dynamic contexts
2146 /// stored in [`ExceptionContextLoc::SPOffset`].
2147 ///
2148 /// If `None`, the compiler backend did not specify a frame
2149 /// offset. The runtime in use with the compiled code may require
2150 /// the frame offset if exception handlers are present or dynamic
2151 /// context is used, but that is not Cranelift's concern: the
2152 /// frame offset is optional at this level.
2153 pub frame_offset: Option<u32>,
2154
2155 /// Range in `exception_handlers` corresponding to the exception
2156 /// handlers for this callsite.
2157 exception_handler_range: Range<u32>,
2158}
2159
2160/// A call site record resulting from a compilation.
2161#[derive(Clone, Debug, PartialEq)]
2162pub struct FinalizedMachCallSite<'a> {
2163 /// The offset of the call's return address, *relative to the
2164 /// start of the buffer*.
2165 pub ret_addr: CodeOffset,
2166
2167 /// The offset from the FP at this callsite down to the SP when
2168 /// the call occurs, if known. In other words, the size of the
2169 /// stack frame up to the saved FP slot. Useful to recover the
2170 /// start of the stack frame and to look up dynamic contexts
2171 /// stored in [`ExceptionContextLoc::SPOffset`].
2172 ///
2173 /// If `None`, the compiler backend did not specify a frame
2174 /// offset. The runtime in use with the compiled code may require
2175 /// the frame offset if exception handlers are present or dynamic
2176 /// context is used, but that is not Cranelift's concern: the
2177 /// frame offset is optional at this level.
2178 pub frame_offset: Option<u32>,
2179
2180 /// Exception handlers at this callsite, with target offsets
2181 /// *relative to the start of the buffer*.
2182 pub exception_handlers: &'a [FinalizedMachExceptionHandler],
2183}
2184
2185/// A patchable call site record resulting from a compilation.
2186#[derive(Clone, Debug, PartialEq)]
2187#[cfg_attr(
2188 feature = "enable-serde",
2189 derive(serde_derive::Serialize, serde_derive::Deserialize)
2190)]
2191pub struct MachPatchableCallSite {
2192 /// The offset of the call's return address (i.e., the address
2193 /// after the end of the patchable region), *relative to the start
2194 /// of the buffer*.
2195 pub ret_addr: CodeOffset,
2196
2197 /// The length of the region to be patched by NOP bytes.
2198 pub len: u32,
2199}
2200
2201/// A source-location mapping resulting from a compilation.
2202#[derive(PartialEq, Debug, Clone)]
2203#[cfg_attr(
2204 feature = "enable-serde",
2205 derive(serde_derive::Serialize, serde_derive::Deserialize)
2206)]
2207pub struct MachSrcLoc<T: CompilePhase> {
2208 /// The start of the region of code corresponding to a source location.
2209 /// This is relative to the start of the function, not to the start of the
2210 /// section.
2211 pub start: CodeOffset,
2212 /// The end of the region of code corresponding to a source location.
2213 /// This is relative to the start of the function, not to the start of the
2214 /// section.
2215 pub end: CodeOffset,
2216 /// The source location.
2217 pub loc: T::SourceLocType,
2218}
2219
2220impl MachSrcLoc<Stencil> {
2221 fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> {
2222 MachSrcLoc {
2223 start: self.start,
2224 end: self.end,
2225 loc: self.loc.expand(base_srcloc),
2226 }
2227 }
2228}
2229
2230/// Record of branch instruction in the buffer, to facilitate editing.
2231#[derive(Clone, Debug)]
2232struct MachBranch {
2233 start: CodeOffset,
2234 end: CodeOffset,
2235 target: MachLabel,
2236 fixup: usize,
2237 inverted: Option<SmallVec<[u8; 8]>>,
2238 /// All labels pointing to the start of this branch. For correctness, this
2239 /// *must* be complete (i.e., must contain all labels whose resolved offsets
2240 /// are at the start of this branch): we rely on being able to redirect all
2241 /// labels that could jump to this branch before removing it, if it is
2242 /// otherwise unreachable.
2243 labels_at_this_branch: SmallVec<[MachLabel; 4]>,
2244}
2245
2246impl MachBranch {
2247 fn is_cond(&self) -> bool {
2248 self.inverted.is_some()
2249 }
2250 fn is_uncond(&self) -> bool {
2251 self.inverted.is_none()
2252 }
2253}
2254
2255/// Stack-frame layout information carried through to machine
2256/// code. This provides sufficient information to interpret an active
2257/// stack frame from a running function, if provided.
2258#[derive(Clone, Debug, PartialEq)]
2259#[cfg_attr(
2260 feature = "enable-serde",
2261 derive(serde_derive::Serialize, serde_derive::Deserialize)
2262)]
2263pub struct MachBufferFrameLayout {
2264 /// Offset from bottom of frame to FP (near top of frame). This
2265 /// allows reading the frame given only FP.
2266 pub frame_to_fp_offset: u32,
2267 /// Offset from bottom of frame for each StackSlot,
2268 pub stackslots: SecondaryMap<ir::StackSlot, MachBufferStackSlot>,
2269}
2270
2271/// Descriptor for a single stack slot in the compiled function.
2272#[derive(Clone, Debug, PartialEq, Default)]
2273#[cfg_attr(
2274 feature = "enable-serde",
2275 derive(serde_derive::Serialize, serde_derive::Deserialize)
2276)]
2277pub struct MachBufferStackSlot {
2278 /// Offset from the bottom of the stack frame.
2279 pub offset: u32,
2280
2281 /// User-provided key to describe this stack slot.
2282 pub key: Option<ir::StackSlotKey>,
2283}
2284
2285/// Debug tags: a sequence of references to a stack slot, or a
2286/// user-defined value, at a particular PC.
2287#[derive(Clone, Debug, PartialEq)]
2288#[cfg_attr(
2289 feature = "enable-serde",
2290 derive(serde_derive::Serialize, serde_derive::Deserialize)
2291)]
2292pub(crate) struct MachDebugTags {
2293 /// Offset at which this tag applies.
2294 pub offset: CodeOffset,
2295
2296 /// Position on the attached instruction. This indicates whether
2297 /// the tags attach to the prior instruction (i.e., as a return
2298 /// point from a call) or the current instruction (i.e., as a PC
2299 /// seen during a trap).
2300 pub pos: MachDebugTagPos,
2301
2302 /// The range in the tag pool.
2303 pub range: Range<u32>,
2304}
2305
2306/// Debug tag position on an instruction.
2307///
2308/// We need to distinguish position on an instruction, and not just
2309/// use offsets, because of the following case:
2310///
2311/// ```plain
2312/// <tag1, tag2> call ...
2313/// <tag3, tag4> trapping_store ...
2314/// ```
2315///
2316/// If the stack is walked and interpreted with debug tags while
2317/// within the call, the PC seen will be the return point, i.e. the
2318/// address after the call. If the stack is walked and interpreted
2319/// with debug tags upon a trap of the following instruction, it will
2320/// be the PC of that instruction -- which is the same PC! Thus to
2321/// disambiguate which tags we want, we attach a "pre/post" flag to
2322/// every group of tags at an offset; and when we look up tags, we
2323/// look them up for an offset and "position" at that offset.
2324///
2325/// Thus there are logically two positions at every offset -- so the
2326/// above will be emitted as
2327///
2328/// ```plain
2329/// 0: call ...
2330/// 4, post: <tag1, tag2>
2331/// 4, pre: <tag3, tag4>
2332/// 4: trapping_store ...
2333/// ```
2334#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2335#[cfg_attr(
2336 feature = "enable-serde",
2337 derive(serde_derive::Serialize, serde_derive::Deserialize)
2338)]
2339pub enum MachDebugTagPos {
2340 /// Tags attached after the instruction that ends at this offset.
2341 ///
2342 /// This is used to attach tags to a call, because the PC we see
2343 /// when walking the stack is the *return point*.
2344 Post,
2345 /// Tags attached before the instruction that starts at this offset.
2346 ///
2347 /// This is used to attach tags to every other kind of
2348 /// instruction, because the PC we see when processing a trap of
2349 /// that instruction is the PC of that instruction, not the
2350 /// following one.
2351 Pre,
2352}
2353
2354/// Iterator item for visiting debug tags.
2355pub struct MachBufferDebugTagList<'a> {
2356 /// Offset at which this tag applies.
2357 pub offset: CodeOffset,
2358
2359 /// Position at this offset ("post", attaching to prior
2360 /// instruction, or "pre", attaching to next instruction).
2361 pub pos: MachDebugTagPos,
2362
2363 /// The underlying tags.
2364 pub tags: &'a [DebugTag],
2365}
2366
2367/// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`.
2368///
2369/// Note that `MachBuffer` was primarily written for intra-function references
2370/// of jumps between basic blocks, but it's also quite usable for entire text
2371/// sections and resolving references between functions themselves. This
2372/// builder interprets "blocks" as labeled functions for the purposes of
2373/// resolving labels internally in the buffer.
2374pub struct MachTextSectionBuilder<I: VCodeInst> {
2375 buf: MachBuffer<I>,
2376 next_func: usize,
2377 force_veneers: ForceVeneers,
2378}
2379
2380impl<I: VCodeInst> MachTextSectionBuilder<I> {
2381 /// Creates a new text section builder which will have `num_funcs` functions
2382 /// pushed into it.
2383 pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> {
2384 let mut buf = MachBuffer::new();
2385 buf.reserve_labels_for_blocks(num_funcs);
2386 MachTextSectionBuilder {
2387 buf,
2388 next_func: 0,
2389 force_veneers: ForceVeneers::No,
2390 }
2391 }
2392}
2393
2394impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> {
2395 fn append(
2396 &mut self,
2397 labeled: bool,
2398 func: &[u8],
2399 align: u32,
2400 ctrl_plane: &mut ControlPlane,
2401 ) -> u64 {
2402 // Conditionally emit an island if it's necessary to resolve jumps
2403 // between functions which are too far away.
2404 let size = func.len() as u32;
2405 if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) {
2406 self.buf
2407 .emit_island_maybe_forced(self.force_veneers, size, ctrl_plane);
2408 }
2409
2410 self.buf.align_to(align);
2411 let pos = self.buf.cur_offset();
2412 if labeled {
2413 self.buf.bind_label(
2414 MachLabel::from_block(BlockIndex::new(self.next_func)),
2415 ctrl_plane,
2416 );
2417 self.next_func += 1;
2418 }
2419 self.buf.put_data(func);
2420 u64::from(pos)
2421 }
2422
2423 fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool {
2424 crate::trace!(
2425 "Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}"
2426 );
2427 let label = MachLabel::from_block(BlockIndex::new(target));
2428 let offset = u32::try_from(offset).unwrap();
2429 match I::LabelUse::from_reloc(reloc, addend) {
2430 Some(label_use) => {
2431 self.buf.use_label_at_offset(offset, label, label_use);
2432 true
2433 }
2434 None => false,
2435 }
2436 }
2437
2438 fn force_veneers(&mut self) {
2439 self.force_veneers = ForceVeneers::Yes;
2440 }
2441
2442 fn write(&mut self, offset: u64, data: &[u8]) {
2443 self.buf.data[offset.try_into().unwrap()..][..data.len()].copy_from_slice(data);
2444 }
2445
2446 fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> {
2447 // Double-check all functions were pushed.
2448 assert_eq!(self.next_func, self.buf.label_offsets.len());
2449
2450 // Finish up any veneers, if necessary.
2451 self.buf
2452 .finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane);
2453
2454 // We don't need the data any more, so return it to the caller.
2455 mem::take(&mut self.buf.data).into_vec()
2456 }
2457}
2458
2459// We use an actual instruction definition to do tests, so we depend on the `arm64` feature here.
2460#[cfg(all(test, feature = "arm64"))]
2461mod test {
2462 use cranelift_entity::EntityRef as _;
2463
2464 use super::*;
2465 use crate::ir::UserExternalNameRef;
2466 use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst};
2467 use crate::isa::aarch64::inst::{OperandSize, xreg};
2468 use crate::machinst::{MachInstEmit, MachInstEmitState};
2469 use crate::settings;
2470
2471 fn label(n: u32) -> MachLabel {
2472 MachLabel::from_block(BlockIndex::new(n as usize))
2473 }
2474 fn target(n: u32) -> BranchTarget {
2475 BranchTarget::Label(label(n))
2476 }
2477
2478 #[test]
2479 fn test_elide_jump_to_next() {
2480 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2481 let mut buf = MachBuffer::new();
2482 let mut state = <Inst as MachInstEmit>::State::default();
2483 let constants = Default::default();
2484
2485 buf.reserve_labels_for_blocks(2);
2486 buf.bind_label(label(0), state.ctrl_plane_mut());
2487 let inst = Inst::Jump { dest: target(1) };
2488 inst.emit(&mut buf, &info, &mut state);
2489 buf.bind_label(label(1), state.ctrl_plane_mut());
2490 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2491 assert_eq!(0, buf.total_size());
2492 }
2493
2494 #[test]
2495 fn test_elide_trivial_jump_blocks() {
2496 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2497 let mut buf = MachBuffer::new();
2498 let mut state = <Inst as MachInstEmit>::State::default();
2499 let constants = Default::default();
2500
2501 buf.reserve_labels_for_blocks(4);
2502
2503 buf.bind_label(label(0), state.ctrl_plane_mut());
2504 let inst = Inst::CondBr {
2505 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2506 taken: target(1),
2507 not_taken: target(2),
2508 };
2509 inst.emit(&mut buf, &info, &mut state);
2510
2511 buf.bind_label(label(1), state.ctrl_plane_mut());
2512 let inst = Inst::Jump { dest: target(3) };
2513 inst.emit(&mut buf, &info, &mut state);
2514
2515 buf.bind_label(label(2), state.ctrl_plane_mut());
2516 let inst = Inst::Jump { dest: target(3) };
2517 inst.emit(&mut buf, &info, &mut state);
2518
2519 buf.bind_label(label(3), state.ctrl_plane_mut());
2520
2521 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2522 assert_eq!(0, buf.total_size());
2523 }
2524
2525 #[test]
2526 fn test_flip_cond() {
2527 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2528 let mut buf = MachBuffer::new();
2529 let mut state = <Inst as MachInstEmit>::State::default();
2530 let constants = Default::default();
2531
2532 buf.reserve_labels_for_blocks(4);
2533
2534 buf.bind_label(label(0), state.ctrl_plane_mut());
2535 let inst = Inst::CondBr {
2536 kind: CondBrKind::Zero(xreg(0), OperandSize::Size64),
2537 taken: target(1),
2538 not_taken: target(2),
2539 };
2540 inst.emit(&mut buf, &info, &mut state);
2541
2542 buf.bind_label(label(1), state.ctrl_plane_mut());
2543 let inst = Inst::Nop4;
2544 inst.emit(&mut buf, &info, &mut state);
2545
2546 buf.bind_label(label(2), state.ctrl_plane_mut());
2547 let inst = Inst::Udf {
2548 trap_code: TrapCode::STACK_OVERFLOW,
2549 };
2550 inst.emit(&mut buf, &info, &mut state);
2551
2552 buf.bind_label(label(3), state.ctrl_plane_mut());
2553
2554 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2555
2556 let mut buf2 = MachBuffer::new();
2557 let mut state = Default::default();
2558 let inst = Inst::TrapIf {
2559 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2560 trap_code: TrapCode::STACK_OVERFLOW,
2561 };
2562 inst.emit(&mut buf2, &info, &mut state);
2563 let inst = Inst::Nop4;
2564 inst.emit(&mut buf2, &info, &mut state);
2565
2566 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2567
2568 assert_eq!(buf.data, buf2.data);
2569 }
2570
2571 #[test]
2572 fn test_island() {
2573 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2574 let mut buf = MachBuffer::new();
2575 let mut state = <Inst as MachInstEmit>::State::default();
2576 let constants = Default::default();
2577
2578 buf.reserve_labels_for_blocks(4);
2579
2580 buf.bind_label(label(0), state.ctrl_plane_mut());
2581 let inst = Inst::CondBr {
2582 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2583 taken: target(2),
2584 not_taken: target(3),
2585 };
2586 inst.emit(&mut buf, &info, &mut state);
2587
2588 buf.bind_label(label(1), state.ctrl_plane_mut());
2589 while buf.cur_offset() < 2000000 {
2590 if buf.island_needed(0) {
2591 buf.emit_island(0, state.ctrl_plane_mut());
2592 }
2593 let inst = Inst::Nop4;
2594 inst.emit(&mut buf, &info, &mut state);
2595 }
2596
2597 buf.bind_label(label(2), state.ctrl_plane_mut());
2598 let inst = Inst::Nop4;
2599 inst.emit(&mut buf, &info, &mut state);
2600
2601 buf.bind_label(label(3), state.ctrl_plane_mut());
2602 let inst = Inst::Nop4;
2603 inst.emit(&mut buf, &info, &mut state);
2604
2605 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2606
2607 assert_eq!(2000000 + 8, buf.total_size());
2608
2609 let mut buf2 = MachBuffer::new();
2610 let mut state = Default::default();
2611 let inst = Inst::CondBr {
2612 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2613
2614 // This conditionally taken branch has a 19-bit constant, shifted
2615 // to the left by two, giving us a 21-bit range in total. Half of
2616 // this range positive so the we should be around 1 << 20 bytes
2617 // away for our jump target.
2618 //
2619 // There are two pending fixups by the time we reach this point,
2620 // one for this 19-bit jump and one for the unconditional 26-bit
2621 // jump below. A 19-bit veneer is 4 bytes large and the 26-bit
2622 // veneer is 20 bytes large, which means that pessimistically
2623 // assuming we'll need two veneers. Currently each veneer is
2624 // pessimistically assumed to be the maximal size which means we
2625 // need 40 bytes of extra space, meaning that the actual island
2626 // should come 40-bytes before the deadline.
2627 taken: BranchTarget::ResolvedOffset((1 << 20) - 20 - 20),
2628
2629 // This branch is in-range so no veneers should be needed, it should
2630 // go directly to the target.
2631 not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4),
2632 };
2633 inst.emit(&mut buf2, &info, &mut state);
2634
2635 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2636
2637 assert_eq!(&buf.data[0..8], &buf2.data[..]);
2638 }
2639
2640 #[test]
2641 fn test_island_backward() {
2642 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2643 let mut buf = MachBuffer::new();
2644 let mut state = <Inst as MachInstEmit>::State::default();
2645 let constants = Default::default();
2646
2647 buf.reserve_labels_for_blocks(4);
2648
2649 buf.bind_label(label(0), state.ctrl_plane_mut());
2650 let inst = Inst::Nop4;
2651 inst.emit(&mut buf, &info, &mut state);
2652
2653 buf.bind_label(label(1), state.ctrl_plane_mut());
2654 let inst = Inst::Nop4;
2655 inst.emit(&mut buf, &info, &mut state);
2656
2657 buf.bind_label(label(2), state.ctrl_plane_mut());
2658 while buf.cur_offset() < 2000000 {
2659 let inst = Inst::Nop4;
2660 inst.emit(&mut buf, &info, &mut state);
2661 }
2662
2663 buf.bind_label(label(3), state.ctrl_plane_mut());
2664 let inst = Inst::CondBr {
2665 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2666 taken: target(0),
2667 not_taken: target(1),
2668 };
2669 inst.emit(&mut buf, &info, &mut state);
2670
2671 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2672
2673 assert_eq!(2000000 + 12, buf.total_size());
2674
2675 let mut buf2 = MachBuffer::new();
2676 let mut state = Default::default();
2677 let inst = Inst::CondBr {
2678 kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2679 taken: BranchTarget::ResolvedOffset(8),
2680 not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)),
2681 };
2682 inst.emit(&mut buf2, &info, &mut state);
2683 let inst = Inst::Jump {
2684 dest: BranchTarget::ResolvedOffset(-(2000000 + 8)),
2685 };
2686 inst.emit(&mut buf2, &info, &mut state);
2687
2688 let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2689
2690 assert_eq!(&buf.data[2000000..], &buf2.data[..]);
2691 }
2692
2693 #[test]
2694 fn test_multiple_redirect() {
2695 // label0:
2696 // cbz x0, label1
2697 // b label2
2698 // label1:
2699 // b label3
2700 // label2:
2701 // nop
2702 // nop
2703 // b label0
2704 // label3:
2705 // b label4
2706 // label4:
2707 // b label5
2708 // label5:
2709 // b label7
2710 // label6:
2711 // nop
2712 // label7:
2713 // ret
2714 //
2715 // -- should become:
2716 //
2717 // label0:
2718 // cbz x0, label7
2719 // label2:
2720 // nop
2721 // nop
2722 // b label0
2723 // label6:
2724 // nop
2725 // label7:
2726 // ret
2727
2728 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2729 let mut buf = MachBuffer::new();
2730 let mut state = <Inst as MachInstEmit>::State::default();
2731 let constants = Default::default();
2732
2733 buf.reserve_labels_for_blocks(8);
2734
2735 buf.bind_label(label(0), state.ctrl_plane_mut());
2736 let inst = Inst::CondBr {
2737 kind: CondBrKind::Zero(xreg(0), OperandSize::Size64),
2738 taken: target(1),
2739 not_taken: target(2),
2740 };
2741 inst.emit(&mut buf, &info, &mut state);
2742
2743 buf.bind_label(label(1), state.ctrl_plane_mut());
2744 let inst = Inst::Jump { dest: target(3) };
2745 inst.emit(&mut buf, &info, &mut state);
2746
2747 buf.bind_label(label(2), state.ctrl_plane_mut());
2748 let inst = Inst::Nop4;
2749 inst.emit(&mut buf, &info, &mut state);
2750 inst.emit(&mut buf, &info, &mut state);
2751 let inst = Inst::Jump { dest: target(0) };
2752 inst.emit(&mut buf, &info, &mut state);
2753
2754 buf.bind_label(label(3), state.ctrl_plane_mut());
2755 let inst = Inst::Jump { dest: target(4) };
2756 inst.emit(&mut buf, &info, &mut state);
2757
2758 buf.bind_label(label(4), state.ctrl_plane_mut());
2759 let inst = Inst::Jump { dest: target(5) };
2760 inst.emit(&mut buf, &info, &mut state);
2761
2762 buf.bind_label(label(5), state.ctrl_plane_mut());
2763 let inst = Inst::Jump { dest: target(7) };
2764 inst.emit(&mut buf, &info, &mut state);
2765
2766 buf.bind_label(label(6), state.ctrl_plane_mut());
2767 let inst = Inst::Nop4;
2768 inst.emit(&mut buf, &info, &mut state);
2769
2770 buf.bind_label(label(7), state.ctrl_plane_mut());
2771 let inst = Inst::Ret {};
2772 inst.emit(&mut buf, &info, &mut state);
2773
2774 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2775
2776 let golden_data = vec![
2777 0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14
2778 0x1f, 0x20, 0x03, 0xd5, // nop
2779 0x1f, 0x20, 0x03, 0xd5, // nop
2780 0xfd, 0xff, 0xff, 0x17, // b 0
2781 0x1f, 0x20, 0x03, 0xd5, // nop
2782 0xc0, 0x03, 0x5f, 0xd6, // ret
2783 ];
2784
2785 assert_eq!(&golden_data[..], &buf.data[..]);
2786 }
2787
2788 #[test]
2789 fn test_handle_branch_cycle() {
2790 // label0:
2791 // b label1
2792 // label1:
2793 // b label2
2794 // label2:
2795 // b label3
2796 // label3:
2797 // b label4
2798 // label4:
2799 // b label1 // note: not label0 (to make it interesting).
2800 //
2801 // -- should become:
2802 //
2803 // label0, label1, ..., label4:
2804 // b label0
2805 let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2806 let mut buf = MachBuffer::new();
2807 let mut state = <Inst as MachInstEmit>::State::default();
2808 let constants = Default::default();
2809
2810 buf.reserve_labels_for_blocks(5);
2811
2812 buf.bind_label(label(0), state.ctrl_plane_mut());
2813 let inst = Inst::Jump { dest: target(1) };
2814 inst.emit(&mut buf, &info, &mut state);
2815
2816 buf.bind_label(label(1), state.ctrl_plane_mut());
2817 let inst = Inst::Jump { dest: target(2) };
2818 inst.emit(&mut buf, &info, &mut state);
2819
2820 buf.bind_label(label(2), state.ctrl_plane_mut());
2821 let inst = Inst::Jump { dest: target(3) };
2822 inst.emit(&mut buf, &info, &mut state);
2823
2824 buf.bind_label(label(3), state.ctrl_plane_mut());
2825 let inst = Inst::Jump { dest: target(4) };
2826 inst.emit(&mut buf, &info, &mut state);
2827
2828 buf.bind_label(label(4), state.ctrl_plane_mut());
2829 let inst = Inst::Jump { dest: target(1) };
2830 inst.emit(&mut buf, &info, &mut state);
2831
2832 let buf = buf.finish(&constants, state.ctrl_plane_mut());
2833
2834 let golden_data = vec![
2835 0x00, 0x00, 0x00, 0x14, // b 0
2836 ];
2837
2838 assert_eq!(&golden_data[..], &buf.data[..]);
2839 }
2840
2841 #[test]
2842 fn metadata_records() {
2843 let mut buf = MachBuffer::<Inst>::new();
2844 let ctrl_plane = &mut Default::default();
2845 let constants = Default::default();
2846
2847 buf.reserve_labels_for_blocks(3);
2848
2849 buf.bind_label(label(0), ctrl_plane);
2850 buf.put1(1);
2851 buf.add_trap(TrapCode::HEAP_OUT_OF_BOUNDS);
2852 buf.put1(2);
2853 buf.add_trap(TrapCode::INTEGER_OVERFLOW);
2854 buf.add_trap(TrapCode::INTEGER_DIVISION_BY_ZERO);
2855 buf.add_try_call_site(
2856 Some(0x10),
2857 [
2858 MachExceptionHandler::Tag(ExceptionTag::new(42), label(2)),
2859 MachExceptionHandler::Default(label(1)),
2860 ]
2861 .into_iter(),
2862 );
2863 buf.add_reloc(
2864 Reloc::Abs4,
2865 &ExternalName::User(UserExternalNameRef::new(0)),
2866 0,
2867 );
2868 buf.put1(3);
2869 buf.add_reloc(
2870 Reloc::Abs8,
2871 &ExternalName::User(UserExternalNameRef::new(1)),
2872 1,
2873 );
2874 buf.put1(4);
2875 buf.bind_label(label(1), ctrl_plane);
2876 buf.put1(0xff);
2877 buf.bind_label(label(2), ctrl_plane);
2878 buf.put1(0xff);
2879
2880 let buf = buf.finish(&constants, ctrl_plane);
2881
2882 assert_eq!(buf.data(), &[1, 2, 3, 4, 0xff, 0xff]);
2883 assert_eq!(
2884 buf.traps()
2885 .iter()
2886 .map(|trap| (trap.offset, trap.code))
2887 .collect::<Vec<_>>(),
2888 vec![
2889 (1, TrapCode::HEAP_OUT_OF_BOUNDS),
2890 (2, TrapCode::INTEGER_OVERFLOW),
2891 (2, TrapCode::INTEGER_DIVISION_BY_ZERO)
2892 ]
2893 );
2894 let call_sites: Vec<_> = buf.call_sites().collect();
2895 assert_eq!(call_sites[0].ret_addr, 2);
2896 assert_eq!(call_sites[0].frame_offset, Some(0x10));
2897 assert_eq!(
2898 call_sites[0].exception_handlers,
2899 &[
2900 FinalizedMachExceptionHandler::Tag(ExceptionTag::new(42), 5),
2901 FinalizedMachExceptionHandler::Default(4)
2902 ],
2903 );
2904 assert_eq!(
2905 buf.relocs()
2906 .iter()
2907 .map(|reloc| (reloc.offset, reloc.kind))
2908 .collect::<Vec<_>>(),
2909 vec![(2, Reloc::Abs4), (3, Reloc::Abs8)]
2910 );
2911 }
2912}