From binary to working Go: reconstructing a malware sample with AI coding agents

30 May, 2026

When ChatGPT was released, it was helpful in some cases to copy decompiled pseudo-code from IDA Pro into ChatGPT and get faster feedback on how the pseudo-code could be translated into real Go code. Reading real Go code is a lot faster than interpreting pseudo-code. Over time, ChatGPT and Claude have both improved their analysis capabilities.

Around the middle of 2025, frontier labs started releasing coding agents, notably Claude Code. Now it seems a new coding agent is released on a weekly basis.

I started trying Claude Code to help with the reversing and decompilation process, but I wanted to get real Go code, not pseudo-code. Initially, the results were not great, partly due to guardrails that in many cases caused the models to refuse to work on malware. This could be worked around in some cases, but the results were still not impressive.

Since autumn 2025, though, something started changing. The models got noticeably better, the guardrails were better tuned to cyber work, and OpenAI launched Trusted Access for Cyber while Anthropic launched its Cyber Verification Program (CVP). Both are intended to adjust the guardrails for cyber professionals doing defensive work.

Let’s look at one example.

Reconstructing the source code

The sample is a piece of Go malware. libgojni.so is the native component of an Android app, compiled with golang.org/x/mobile (Go Mobile) so the app can call into it over JNI. It is a 32-bit x86 c-shared binary and, because Go binaries carry their own metadata whether you want them to or not, far more legible than a stripped C binary of the same size.

The binary

Before reading a single line of decompiler output, a few read-only tools establish ground truth. With a Go binary, this step pays for itself immediately, because the toolchain leaves the function names, file names, and line ranges sitting in the pclntab.

1redress info   libgojni.so   # Go toolchain version, build flags, module graph
2go version -m  libgojni.so   # embedded module version information
3redress source libgojni.so   # file names and line ranges for every function
4nm / objdump                 # symbol table, raw disassembly spot-checks
5binwalk        libgojni.so   # embedded signatures (AES S-boxes at 0x8B0F40)

Field	Value
Compiler	`go1.25.1` (2025-09-03)
Target	`GOOS=android GOARCH=386` (x86, 32-bit)
Build mode	`-buildmode=c-shared` (CGO enabled)
Module path	`mobile_client`
Notable deps	`quic-go v0.55.0`, `golang.org/x/crypto v0.43.0`, `gopsutil/v3 v3.24.5`

For decompilation, I used IDA Pro, not Binary Ninja. Both are capable decompilers, but IDA currently resolves the internal Go symbols from the pclntab structure that the linker embeds, and Binja does not yet. On a Go binary, that is the difference between a named call graph and a wall of sub_4xxxxx. redress source gives you the same map a second way: every function name, its source file, and its line range, which means the package structure is known before any code is written.

What the agent actually sees

The input is deliberately narrow. The agent gets:

the IDA Pro pseudo-C, one .dec file per source file;
the redress source function-to-file map, so packages and file boundaries are fixed up front;
a captured C2 session with real ciphertext blobs, used as test vectors;
a task spec that draws two lines.

The build target is the local Linux host, not Android, so the CGO bindings and Go Mobile glue (_cgo_gotypes.go, seq_android.go) are out. Only the actual application code is reconstructured.

A worked example: key derivation

The most useful test of whether it can really read decompiler output is a function where the answer is not obvious from the decompiled code. The encryption uses AES-256-GCM with a 12-byte prepended nonce, and the key derivation is the kind of thing that is easy to get almost right.

Here is the IDA pseudo-C for the key setup:

 1if ( a4 != 0 || a5 != 0 )
 2{
 3  v23 = runtime_convTstring(a6, a7);
 4  v12 = runtime_int64div(a4, a5, 5, 0);
 5  v25 = (void *)runtime_convT64(v12, v14);
 6  v15 = fmt_Sprintf((int)"%s%v", 4, (int)&v22, 2, 2);
 7}
 8else
 9{
10  v23 = runtime_convTstring(a6, a7);
11  v25 = &unk_FF5B8;
12  v15 = fmt_Sprintf((int)"%s%v", 4, (int)&v22, 2, 2);
13}
14if ( v8 < 0x20 )
15  runtime_panicSliceB(v8);
16v11 = crypto_aes_NewCipher((int)v7 + v8 - 32, 32, 32);

Four details decide the reconstruction, and none of them are spelled out:

a4/a5 are the high and low DWORDs of the int64 token, split by the 32-bit x86 calling convention. They are one argument, not two.
runtime_int64div(a4, a5, 5, 0) is token / 5.
&unk_FF5B8 is the integer constant 0 in the data segment; under %v it renders as the string "0".
(int)v7 + v8 - 32 with the v8 < 0x20 guard is []byte(keyStr)[len(keyStr)-32:], the last 32 bytes, with a length check.

The Encrypting counterpart is where reading carefully pays off: both branches of its if/else are byte-for-byte identical, which means Encrypting always appends "0" regardless of the token. Miss that and you produce a function that looks plausible and decrypts nothing.

Reconstructed:

 1func Decrypting(ciphertext []byte, keyBase string, token int64) ([]byte, error) {
 2    var keyStr string
 3    if token == 0 {
 4        keyStr = fmt.Sprintf("%s%v", keyBase, 0)
 5    } else {
 6        keyStr = fmt.Sprintf("%s%v", keyBase, strconv.FormatInt(token/5, 10))
 7    }
 8    if len(keyStr) < 32 {
 9        return nil, errors.New("key too short")
10    }
11    key := []byte(keyStr)[len(keyStr)-32:]
12    block, _ := aes.NewCipher(key)
13    gcm, _ := cipher.NewGCM(block)
14    nonce, data := ciphertext[:gcm.NonceSize()], ciphertext[gcm.NonceSize():]
15    return gcm.Open(nil, nonce, data, nil)
16}

The keyBase itself is built differently per context: userHash[:5] + appHash[5:] for login and target traffic, and a timestamp-derived value for the inner layer of result reporting. That was confirmed the only way it can be: by decrypting a captured payload and checking the recovered fields against a packet capture.

Those blobs are embedded as constants in a *_test.go file and used as test vectors: the full key-derivation, AES-GCM decrypt, JSON unmarshal pipeline is exercised against data the original binary produced. If the reconstruction is wrong by one byte, these fail.

Strings, by length

A smaller forensic detail, same idea. The decompiler reports string lengths, and those lengths disambiguate strings you might otherwise copy incorrectly.

Header	Value	Length
Accept (attack / login)	`text/html,application/xhtml+xml,application/xml,`	48
Accept (result reporting)	`…application/xml,application/json`	64
Accept-Encoding (attack)	`gzip, deflate, br`	17
Accept-Encoding (health check)	`text`	4

The 48-vs-64 Accept header is also a network-level indicator that separates attack traffic from reporting traffic, which is the kind of thing this whole exercise exists to surface.

The Claude Code run

The most recent attempt used Claude Code in its default configuration: Sonnet 4.6 as the main model, Opus 4.7 in advisor mode. Given the same plain-text decompiler output and the redress map, it produced every function and file called for by the decompilation.

For the first time, nothing was left as accidental dead code. The only gaps were the offensive parts, and those were deliberate. They were documented stubs: // STUB: this code is malicious and therefore not implemented, with the surrounding variables kept in place so the package still compiles. The functional behaviour matches the binary. The model occasionally chose a different implementation or added logging and extra checks, which for this purpose is fine; functional equivalence is the bar, not byte-for-byte mimicry.

Component	Status
AES-256-GCM encrypt / decrypt	complete
Key derivation	complete
Login handshake	complete
Random string generation	complete
Request construction	complete
Target fetch + decrypt	complete
Result reporting	complete
Health / IP check	complete
`HttpJob` flood worker	stub
`NGINXLoris.Flood`	stub
`TargetWorker` dispatch loop	stub
App CLI	complete

The three stubs keep their signatures, their struct definitions, and a comment describing what the original did: the transport dispatch by target.Type, the Slowloris connection-holding pattern, and the goroutine fan-out. They contain no working attack loop. That is enough to analyse, but not enough to run an attack.

This is the first run where I am convinced the agent can reconstruct a decompiled binary in full, not most of it. That is not the same as saying there are no limits left. There are.

A note on Grok

I ran the same input through Grok Build (CLI). It needed many more iterations to get there, and then reconstructed the offensive code that every other run left stubbed. Same binary, same plain-text decompilation, and no instruction to implement the attack loops; it implemented them anyway.

The challenges

The failure modes have been consistent across agents and versions, and most of them are not about whether the model can write Go:

Model lock-in. Some harnesses, such as Antigravity, did not let me change the model, which made A/B comparison impossible.
Token cost. Go binaries are large, so you usually include only the functions in question. For a mobile app, that is fiddly but doable because Go is cross-platform.
Iteration discipline. “Do task 1, then 2, then 3” is not always followed. Smaller tasks with explicit iteration work better, even when the agent should be able to plan it itself.
Coverage. Not all functions get written on the first pass; some are generated completely, some only barely, and some not at all. More iterations close the gap.
Prompt and diff hygiene. Being able to revise prompts and use Git to track what each iteration changed is what turns this from a demo into a method.
Expecting too much. The agents can compile and run the toolchain and iterate until the build is green; they are less reliable at deciding when the job is actually finished.

A recurring positive: agents that write *_test.go files to validate their own implementation tend to converge faster, because the test is a fixed point the iteration can pull toward.

Final observations

Preparing input to a coding agent is vital; do not just dump a binary directly into the agent. Either prepare the input before starting, or equip the coding agent with the right skills. In particular, feeding the coding agent data that it can use to validate specific functions significantly increases the probability of a good result.

In the latest Claude Code run, with Opus advising Sonnet, the agent reconstructed every function and file and stopped at exactly one boundary: the attack functionality. It left those parts as documented stubs, with the surrounding variables kept in place so the project still compiled. Grok, given the identical text, reached the same functional code but spent more iterations getting there and recreated the DDoS loops without being asked. I did not run a full end-to-end offensive test, but the defensive reconstruction is complete enough.

Coding agents benefit not only threat actors, but also cyber defenders.

#Reverse-Engineering #Go #Ida-Pro #Decompilation #Malware #Ai-Assisted #Claude #Codex #Pi #Grok #Redress