From binary to working Go: reconstructing a malware sample with AI coding agents
When ChatGPT was released, it was helpful in some cases to copy decompiled pseudo-code from IDA Pro into ChatGPT and get faster feedback on how the pseudo-code could be translated into real Go code. Reading real Go code is a lot faster than interpreting pseudo-code. Over time, ChatGPT and Claude have both improved their analysis capabilities.
Around the middle of 2025, frontier labs started releasing coding agents, notably Claude Code. Now it seems a new coding agent is released on a weekly basis.
I started trying Claude Code to help with the reversing and decompilation process, but I wanted to get real Go code, not pseudo-code. Initially, the results were not great, partly due to guardrails that in many cases caused the models to refuse to work on malware. This could be worked around in some cases, but the results were still not impressive.
Since autumn 2025, though, something started changing. The models got noticeably better, the guardrails were better tuned to cyber work, and OpenAI launched Trusted Access for Cyber while Anthropic launched its Cyber Verification Program (CVP). Both are intended to adjust the guardrails for cyber professionals doing defensive work.
Let’s look at one example.
Reconstructing the source code
The sample is a piece of Go malware. libgojni.so is the native component of an
Android app, compiled with golang.org/x/mobile (Go Mobile) so the app can call
into it over JNI. It is a 32-bit x86 c-shared binary and, because Go binaries
carry their own metadata whether you want them to or not, far more legible than
a stripped C binary of the same size.
The binary
Before reading a single line of decompiler output, a few read-only tools
establish ground truth. With a Go binary, this step pays for itself immediately,
because the toolchain leaves the function names, file names, and line ranges
sitting in the pclntab.
1redress info libgojni.so # Go toolchain version, build flags, module graph
2go version -m libgojni.so # embedded module version information
3redress source libgojni.so # file names and line ranges for every function
4nm / objdump # symbol table, raw disassembly spot-checks
5binwalk libgojni.so # embedded signatures (AES S-boxes at 0x8B0F40)| Field | Value |
|---|---|
| Compiler | go1.25.1 (2025-09-03) |
| Target | GOOS=android GOARCH=386 (x86, 32-bit) |
| Build mode | -buildmode=c-shared (CGO enabled) |
| Module path | mobile_client |
| Notable deps | quic-go v0.55.0, golang.org/x/crypto v0.43.0, gopsutil/v3 v3.24.5 |
For decompilation, I used IDA Pro, not Binary Ninja. Both are capable
decompilers, but IDA currently resolves the internal Go symbols from the
pclntab structure that the linker embeds, and Binja does not yet. On a Go
binary, that is the difference between a named call graph and a wall of
sub_4xxxxx. redress source gives you the same map a second way: every
function name, its source file, and its line range, which means the package
structure is known before any code is written.
What the agent actually sees
The input is deliberately narrow. The agent gets:
- the IDA Pro pseudo-C, one
.decfile per source file; - the
redress sourcefunction-to-file map, so packages and file boundaries are fixed up front; - a captured C2 session with real ciphertext blobs, used as test vectors;
- a task spec that draws two lines.
The build target is the local Linux host, not Android, so the CGO bindings and
Go Mobile glue (_cgo_gotypes.go, seq_android.go) are out. Only the actual
application code is reconstructured.
A worked example: key derivation
The most useful test of whether it can really read decompiler output is a function where the answer is not obvious from the decompiled code. The encryption uses AES-256-GCM with a 12-byte prepended nonce, and the key derivation is the kind of thing that is easy to get almost right.
Here is the IDA pseudo-C for the key setup:
1if ( a4 != 0 || a5 != 0 )
2{
3 v23 = runtime_convTstring(a6, a7);
4 v12 = runtime_int64div(a4, a5, 5, 0);
5 v25 = (void *)runtime_convT64(v12, v14);
6 v15 = fmt_Sprintf((int)"%s%v", 4, (int)&v22, 2, 2);
7}
8else
9{
10 v23 = runtime_convTstring(a6, a7);
11 v25 = &unk_FF5B8;
12 v15 = fmt_Sprintf((int)"%s%v", 4, (int)&v22, 2, 2);
13}
14if ( v8 < 0x20 )
15 runtime_panicSliceB(v8);
16v11 = crypto_aes_NewCipher((int)v7 + v8 - 32, 32, 32);Four details decide the reconstruction, and none of them are spelled out:
a4/a5are the high and low DWORDs of theint64token, split by the 32-bit x86 calling convention. They are one argument, not two.runtime_int64div(a4, a5, 5, 0)istoken / 5.&unk_FF5B8is the integer constant0in the data segment; under%vit renders as the string"0".(int)v7 + v8 - 32with thev8 < 0x20guard is[]byte(keyStr)[len(keyStr)-32:], the last 32 bytes, with a length check.
The Encrypting counterpart is where reading carefully pays off: both branches
of its if/else are byte-for-byte identical, which means Encrypting always
appends "0" regardless of the token. Miss that and you produce a function that
looks plausible and decrypts nothing.
Reconstructed:
1func Decrypting(ciphertext []byte, keyBase string, token int64) ([]byte, error) {
2 var keyStr string
3 if token == 0 {
4 keyStr = fmt.Sprintf("%s%v", keyBase, 0)
5 } else {
6 keyStr = fmt.Sprintf("%s%v", keyBase, strconv.FormatInt(token/5, 10))
7 }
8 if len(keyStr) < 32 {
9 return nil, errors.New("key too short")
10 }
11 key := []byte(keyStr)[len(keyStr)-32:]
12 block, _ := aes.NewCipher(key)
13 gcm, _ := cipher.NewGCM(block)
14 nonce, data := ciphertext[:gcm.NonceSize()], ciphertext[gcm.NonceSize():]
15 return gcm.Open(nil, nonce, data, nil)
16}The keyBase itself is built differently per context:
userHash[:5] + appHash[5:] for login and target traffic, and a
timestamp-derived value for the inner layer of result reporting. That was
confirmed the only way it can be: by decrypting a captured payload and checking
the recovered fields against a packet capture.
Those blobs are embedded as constants in a *_test.go file and used as
test vectors: the full key-derivation, AES-GCM decrypt, JSON unmarshal
pipeline is exercised against data the original binary produced. If the
reconstruction is wrong by one byte, these fail.
Strings, by length
A smaller forensic detail, same idea. The decompiler reports string lengths, and those lengths disambiguate strings you might otherwise copy incorrectly.
| Header | Value | Length |
|---|---|---|
| Accept (attack / login) | text/html,application/xhtml+xml,application/xml, |
48 |
| Accept (result reporting) | …application/xml,application/json |
64 |
| Accept-Encoding (attack) | gzip, deflate, br |
17 |
| Accept-Encoding (health check) | text |
4 |
The 48-vs-64 Accept header is also a network-level indicator that separates attack traffic from reporting traffic, which is the kind of thing this whole exercise exists to surface.
The Claude Code run
The most recent attempt used Claude Code in its default configuration: Sonnet
4.6 as the main model, Opus 4.7 in advisor mode. Given the same plain-text
decompiler output and the redress map, it produced every function and file
called for by the decompilation.
For the first time, nothing was left as accidental dead code. The only gaps were
the offensive parts, and those were deliberate. They were documented stubs:
// STUB: this code is malicious and therefore not implemented, with the
surrounding variables kept in place so the package still compiles. The
functional behaviour matches the binary. The model occasionally chose a
different implementation or added logging and extra checks, which for this
purpose is fine; functional equivalence is the bar, not byte-for-byte mimicry.
| Component | Status |
|---|---|
| AES-256-GCM encrypt / decrypt | complete |
| Key derivation | complete |
| Login handshake | complete |
| Random string generation | complete |
| Request construction | complete |
| Target fetch + decrypt | complete |
| Result reporting | complete |
| Health / IP check | complete |
HttpJob flood worker |
stub |
NGINXLoris.Flood |
stub |
TargetWorker dispatch loop |
stub |
| App CLI | complete |
The three stubs keep their signatures, their struct definitions, and a comment
describing what the original did: the transport dispatch by target.Type, the
Slowloris connection-holding pattern, and the goroutine fan-out. They contain no
working attack loop. That is enough to analyse, but not enough to run an attack.
This is the first run where I am convinced the agent can reconstruct a decompiled binary in full, not most of it. That is not the same as saying there are no limits left. There are.
A note on Grok
I ran the same input through Grok Build (CLI). It needed many more iterations to get there, and then reconstructed the offensive code that every other run left stubbed. Same binary, same plain-text decompilation, and no instruction to implement the attack loops; it implemented them anyway.
The challenges
The failure modes have been consistent across agents and versions, and most of them are not about whether the model can write Go:
- Model lock-in. Some harnesses, such as Antigravity, did not let me change the model, which made A/B comparison impossible.
- Token cost. Go binaries are large, so you usually include only the functions in question. For a mobile app, that is fiddly but doable because Go is cross-platform.
- Iteration discipline. “Do task 1, then 2, then 3” is not always followed. Smaller tasks with explicit iteration work better, even when the agent should be able to plan it itself.
- Coverage. Not all functions get written on the first pass; some are generated completely, some only barely, and some not at all. More iterations close the gap.
- Prompt and diff hygiene. Being able to revise prompts and use Git to track what each iteration changed is what turns this from a demo into a method.
- Expecting too much. The agents can compile and run the toolchain and iterate until the build is green; they are less reliable at deciding when the job is actually finished.
A recurring positive: agents that write *_test.go files to validate their own
implementation tend to converge faster, because the test is a fixed point the
iteration can pull toward.
Final observations
Preparing input to a coding agent is vital; do not just dump a binary directly into the agent. Either prepare the input before starting, or equip the coding agent with the right skills. In particular, feeding the coding agent data that it can use to validate specific functions significantly increases the probability of a good result.
In the latest Claude Code run, with Opus advising Sonnet, the agent reconstructed every function and file and stopped at exactly one boundary: the attack functionality. It left those parts as documented stubs, with the surrounding variables kept in place so the project still compiled. Grok, given the identical text, reached the same functional code but spent more iterations getting there and recreated the DDoS loops without being asked. I did not run a full end-to-end offensive test, but the defensive reconstruction is complete enough.
Coding agents benefit not only threat actors, but also cyber defenders.
#Reverse-Engineering #Go #Ida-Pro #Decompilation #Malware #Ai-Assisted #Claude #Codex #Pi #Grok #Redress