C# Disassembly

If you're ever curious as to how your C# code turns into executable code, sharplab.io is a very good place to start.

The Basics

The website, created by Andrey Shcheckin, follows the pattern of other online code compilers like Tim Jones' shader playground which handles HLSL/GLSL/others, or Matt Godbolt's Compiler Explorer with multiple languages and compilers (C and C++ for sure, but also CUDA, Go, Python, Rust, Swift, and a few others).

There's a pane where you can select a language and write or paste your code (you can see Roslyn-generated errors and warnings like unused using directives, which is nice), and then another pane where you can select the output you want to see.

The outputs at the moment of this writing are the following.

Interesting Example of C# Decompilation

While you may not think that compiling to C# and then decompiling into C# is a very useful thing, it's interesting to see what transformations are done by the compiler itself.

A classic example is the state machine for iterators. For example, this bit of code:

using System.Collections.Generic;

public class C {
  public IEnumerable<int> f() {
    yield return 1;
    yield return 2;
  }
}

Will produce this output

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Security;
using System.Security.Permissions;

[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(
 DebuggableAttribute.DebuggingModes.Default |
 DebuggableAttribute.DebuggingModes.DisableOptimizations |
 DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints |
 DebuggableAttribute.DebuggingModes.EnableEditAndContinue)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
public class C
{
    [CompilerGenerated]
    private sealed class <f>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IDisposable, IEnumerator
    {
        private int <>1__state;

        private int <>2__current;

        private int <>l__initialThreadId;

        public C <>4__this;

        int IEnumerator<int>.Current
        {
            [DebuggerHidden]
            get
            {
                return <>2__current;
            }
        }

        object IEnumerator.Current
        {
            [DebuggerHidden]
            get
            {
                return <>2__current;
            }
        }

        [DebuggerHidden]
        public <f>d__0(int <>1__state)
        {
            this.<>1__state = <>1__state;
            <>l__initialThreadId = Environment.CurrentManagedThreadId;
        }

        [DebuggerHidden]
        void IDisposable.Dispose()
        {
        }

        private bool MoveNext()
        {
            switch (<>1__state)
            {
                default:
                    return false;
                case 0:
                    <>1__state = -1;
                    <>2__current = 1;
                    <>1__state = 1;
                    return true;
                case 1:
                    <>1__state = -1;
                    <>2__current = 2;
                    <>1__state = 2;
                    return true;
                case 2:
                    <>1__state = -1;
                    return false;
            }
        }

        bool IEnumerator.MoveNext()
        {
            //ILSpy generated this explicit interface implementation from .override directive in MoveNext
            return this.MoveNext();
        }

        [DebuggerHidden]
        void IEnumerator.Reset()
        {
            throw new NotSupportedException();
        }

        [DebuggerHidden]
        IEnumerator<int> IEnumerable<int>.GetEnumerator()
        {
            <f>d__0 <f>d__;
            if (<>1__state == -2 && <>l__initialThreadId == Environment.CurrentManagedThreadId)
            {
                <>1__state = 0;
                <f>d__ = this;
            }
            else
            {
                <f>d__ = new <f>d__0(0);
                <f>d__.<>4__this = <>4__this;
            }
            return <f>d__;
        }

        [DebuggerHidden]
        IEnumerator IEnumerable.GetEnumerator()
        {
            return ((IEnumerable<int>)this).GetEnumerator();
        }
    }

    [IteratorStateMachine(typeof(<f>d__0))]
    public IEnumerable<int> f()
    {
        <f>d__0 <f>d__ = new <f>d__0(-2);
        <f>d__.<>4__this = this;
        return <f>d__;
    }
}

So, there are a bunch of directives that have to do with the assembly itself - those are attributes like CompilationRelaxations for example.

The next thing to look at is the very last method, however. f now creates a new instance of a compiler-generated type and returns it. You can see that the implementation keeps a <>1__state field, and depending on its value the enumerable will be created, it will update its state and current values as an enumerator is created/initialized and then iterated.

Interesting Example of Jit ASM Decompilation

OK, we're going to look at three different ways of iterating through an array of strings and printing out its output. Make sure you set the output pane to Release and not Debug.

using System;

public static class C {
    public static void IterEval(string[] values) {
        for (int i = 0; i < values.Length; i++) {
            Console.WriteLine(values[i]);
        }
    }
    public static void IterForeach(string[] values) {
        foreach (var i in values) {
            Console.WriteLine(i);
        }
    }
    public static void IterAssigned(string[] values) {
        int l = values.Length;
        for (int i = 0; i < l; i++) {
            Console.WriteLine(values[i]);
        }
    }
}

There are thee methods: IterEval evaluates values.Length directly in the for loop. IterForeach uses a foreach loop. IterAssigned is the same as IterEval, but values.Length is only evaluated once, outside the loop.

This is what each of these look like.

C.IterEval(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: push ebx
    L0006: mov esi, ecx
    L0008: xor edi, edi
    L000a: mov ebx, [esi+4]
    L000d: test ebx, ebx
    L000f: jle short L001f
    L0011: mov ecx, [esi+edi*4+8]
    L0015: call System.Console.WriteLine(System.String)
    L001a: inc edi
    L001b: cmp ebx, edi
    L001d: jg short L0011
    L001f: pop ebx
    L0020: pop esi
    L0021: pop edi
    L0022: pop ebp
    L0023: ret

IterEval has the following instructions.

C.IterForeach(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: push ebx
    L0006: mov esi, ecx
    L0008: xor edi, edi
    L000a: mov ebx, [esi+4]
    L000d: test ebx, ebx
    L000f: jle short L001f
    L0011: mov ecx, [esi+edi*4+8]
    L0015: call System.Console.WriteLine(System.String)
    L001a: inc edi
    L001b: cmp ebx, edi
    L001d: jg short L0011
    L001f: pop ebx
    L0020: pop esi
    L0021: pop edi
    L0022: pop ebp
    L0023: ret

Turns out that even though conceptually foreach is a very different beast from a for loop (with its use of enumerators and whatnot), the compiler recognizes we're iterating over an array and generates identical code as with the straightforward for loop.

C.IterAssigned(System.String[])
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: push ebx

    L0006: mov esi, ecx
    L0008: mov edi, [esi+4]
    L000b: xor ebx, ebx
    L000d: test edi, edi
    L000f: jle short L003c

    L0011: test edi, edi
    L0013: setge cl
    L0016: movzx ecx, cl
    L0019: test cl, 1
    L001c: je short L002e

    L001e: mov ecx, [esi+ebx*4+8]
    L0022: call System.Console.WriteLine(System.String)
    L0027: inc ebx
    L0028: cmp ebx, edi
    L002a: jl short L001e

    L002c: jmp short L003c

    L002e: mov ecx, [esi+ebx*4+8]
    L0032: call System.Console.WriteLine(System.String)
    L0037: inc ebx
    L0038: cmp ebx, edi
    L003a: jl short L002e

    L003c: pop ebx
    L003d: pop esi
    L003e: pop edi
    L003f: pop ebp
    L0040: ret

Now, here we see some differences at last. I've added some extra line breaks this time. This is what's happening in IterAssigned, in comparison to the prior two functions.

Here, I find myself a bit stumped - I can't quite say why all of this remains in the code.

OK, so I managed to confuse myself and/or the compiler a bit, but we still didn't get a bounds check. Let's try this instead.

...
    public static void IterArg(string[] values, int l) {
        for (int i = 0; i < l; i++) {
            Console.WriteLine(values[i]);
        }
    }
...

And then, hey presto!

C.IterArg(System.String[], Int32)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push edi
    L0004: push esi
    L0005: push ebx
    L0006: mov esi, ecx
    L0008: mov edi, edx
    L000a: xor ebx, ebx
    L000c: test edi, edi
    L000e: jle short L004c
    L0010: test esi, esi
    L0012: je short L0039
    L0014: cmp [esi+4], edi
    L0017: setge cl
    L001a: movzx ecx, cl
    L001d: test edi, edi
    L001f: setge al
    L0022: movzx eax, al
    L0025: test eax, ecx
    L0027: je short L0039
    L0029: mov ecx, [esi+ebx*4+8]
    L002d: call System.Console.WriteLine(System.String)
    L0032: inc ebx
    L0033: cmp ebx, edi
    L0035: jl short L0029
    L0037: jmp short L004c
    L0039: cmp ebx, [esi+4]
    L003c: jae short L0051
    L003e: mov ecx, [esi+ebx*4+8]
    L0042: call System.Console.WriteLine(System.String)
    L0047: inc ebx
    L0048: cmp ebx, edi
    L004a: jl short L0039
    L004c: pop ebx
    L004d: pop esi
    L004e: pop edi
    L004f: pop ebp
    L0050: ret
    L0051: call 0x71b775b0
    L0056: int3

I'm not going to go into the detailed disassembly this time, as the patterns are roughly the same as before.

I will howerver call attention to L3c, which jumps past the return on L50 and onto L51, which is calling and external (fixed-address, known to JIT) error handler followed by a debug break interrupt.

Learning More

A great way of learning more is looking at how other interesting constructs are handled, like async and await, throwing and handling exceptions, or switches with patterns.

If you're curious, the GitHub repo has the sources for the website.

Happy decompiling!

References

It's been a while since I looked at x86, so I ended up using a few references.

Tags:  codingdotnetperf

Home