![BLOG_yara_800 BLOG_yara_800]()
Nviso Labs recently published a fascinating blog post illustrating the use of the Lua programming language over the Suricata DPI engine to detect obfuscations in PDF files. Deep analysis of content seen on networks is a topic close to our heart at Fidelis Cybersecurity. After reading that post, we decided to investigate how we could implement this detection by creating a rule in the Yara content scanning engine within one of our own products. This blog walks you through our logic and shows how trivial it is to apply it to PDF content in network traffic.
Analysis
First, a bit of background. Fidelis Network has the ability to rip through sessions and the content inside them. It provides a wide range of 'Analyzers' that can be applied to such sessions and content. One of the more popular analyzers integrates Yara. Yara is broadly favored by malware researchers for file analysis, particularly with a focus on determining maliciousness. The Fidelis Threat Research team finds Yara to be an immensely helpful tool in our daily work.
The core of our exercise is captured here in the Nviso Labs blog:
One of the elements that make up a PDF is a name. A name is a reserved word that starts with character / followed by alphanumerical characters. Example: /JavaScript. The presence of the name /JavaScript is an indication that the PDF contains scripts (written in JavaScript).
The PDF specification allows for the substitution of alphanumerical characters in a name by an hexadecimal representation: /J#61vaScript. #61 is the hexadecimal representation of letter a. We call the use of this hexadecimal representation in names “name obfuscation”, because it is a simple technique to evade detection by engines that just look for the normal, unobfuscated name (/JavaScript).
Doing it in Yara
Nviso takes the logic and creates an elegant Lua rule. Our rule below represents a slightly brute-force way of achieving the same detection, but this time in Yara.
rule GENERIC_PDF_ObfuscatedJavaScriptName {
meta:
copyright = "Fidelis Cybersecurity"
description = "Detects PDF files with an obfuscated JavaScript name."
source = "https://blog.nviso.be/2017/03/10/developing-complex-suricata-rules-with-lua-part-1/"
strings:
$name1 = { 2F 23 34 41 (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name2 = { 2F (23 34 41 | 4A) 23 36 31 (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name3 = { 2F (23 34 41 | 4A) (23 36 31 | 61) 23 37 36 (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name4 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) 23 36 31 (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name5 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) 23 35 33 (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name6 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) 23 36 33 (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name7 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) 23 37 32 (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$name8 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) 23 36 39 (23 37 30 | 70) (23 37 34 | 74) }
$name9 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) 23 37 30 (23 37 34 | 74) }
$name10 = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) 23 37 34 }
condition:
//Check for "%PDF" magic header, not needed when applied in Fidelis Network
uint32(0) == 0x46445025 and
any of them
}
This rule takes advantage of the 'alternatives' feature in Yara, which is applied to hexadecimal strings. For example, (23 34 41 | 4A) represents the hexadecimal representation of both the obfuscated and unobfuscated versions of the letter 'J'. Ultimately this leads to detecting files with all combinations of this specific obfuscation.
/ (#4A | J) (#61 | a) (#76 | v) (#61 | a) (#53 | S) (#63 | c) (#72 | r) (#69 | i) (#70 | p) (#74 | t)
You'd be right to see this particular Yara rule and think it's complex. But if you're used to doing analysis by brute forcing through common obfuscations or encodings, then it's really not that much of a stretch. We'd say that it took our analyst about 20 minutes to write this up after reading the Nvisio blog post.
Application in Fidelis Network
One of the upsides of doing this rule in Yara is that you can run it on multiple malware analysis platforms against stored file zoos. We ran it on one of the platforms that we have access to (we've posted discovered file hashes at the bottom of this post). In our analysis, just like Nviso reported, all the files we discovered had been determined to be malicious.
The Nviso blog concludes with a bit of a reality check – actually applying the Lua rule to network traffic with Suricata can affect performance of the IPS engine. There are some measures proposed to tamp down on this impact. But to the trained eye, it's clear these are sub-optimal from a detection standpoint, even as it lets you keep your IPS running.
This is where Fidelis Network really makes a difference. It lets you apply Yara rules to objects that have been extracted from network sessions. Even better, it performs file classification independently of the rule. You get to specify 'I want to apply this Yara rule against PDF files' and now the rule is applied to files that Fidelis Network identifies as PDFs, regardless of the network protocol or intermediate encoding or compression layers.
The approach provides these benefits:
- The rule is applied to all network traffic – HTTP, SMTP, POP3, FTP, SMB and many others. Email is a primary vector for the delivery of malicious PDFs, so it's essential to apply this rule to all traffic.
- The system consistently identifies PDF files using a variety of techniques, not easily bypassed by obfuscation (note the %PDF- identifier and accompanying comment in the Nviso blog)
- You're only incurring the compute cost of the rule when the system identifies a PDF file.
Don't get us wrong. This is not intended to be a knock on the Suricata+Lua combination, which after all is an open-source tool used by countless network defenders worldwide.
Instead, after reading the post, we saw what looked like a good opportunity to highlight our implementation and the benefits of using Yara in an intrusion prevention context.
The following screenshots show what the detection would look like in Fidelis Network when the file is transferred over HTTP:
![Image3 Image3]()
Note: After we wrote this post, Nviso published a follow up blog discussing an iterative process towards improving the Lua rule. This is exactly what we like about the use of Yara since it has an extensive toolset and ecosystem in place, including the ability to run rules against datasets. See the hashes in Appendix B for an example of what we discovered with such a run.
Appendix A - Alternate Yara rule
The Yara rule below is one of our analyst's earlier iterations. It is much more elegant, compact and functionally equivalent to the verbose final version above. But this rule raises performance warnings from the native Yara engine. Due to the Yara engine's inner workings around the selection of "atoms," this rule's approach is more computation intensive and required de-optimization of the syntax logic to create the final version shown above.
rule GENERIC_PDF_ObfuscatedJavaScriptName{
meta:
copyright = "Fidelis Cybersecurity"
description = "Detects PDF files with an obfuscated JavaScript name."
source = "https://blog.nviso.be/2017/03/10/developing-complex-suricata-rules-with-lua-part-1/"
strings:
// "/(#4A|J) (#61|a) (#76|v) (#61|a) (#53|S) (#63|c) (#72|r) (#69|i) (#70|p) (#74|t)"
$name = { 2F (23 34 41 | 4A) (23 36 31 | 61) (23 37 36 | 76) (23 36 31 | 61) (23 35 33 | 53) (23 36 33 | 63) (23 37 32 | 72) (23 36 39 | 69) (23 37 30 | 70) (23 37 34 | 74) }
$jsname = "/JavaScript"
condition:
uint32(0) == 0x46445025 and //Check for "%PDF" magic header
#name > #jsname //Check if more obfuscated versions exist than the number of un-obfuscated versions
}
Appendix B - Detected File Hashes
f5aac4bb54cc524f91ed78952ecc12d7ea5c07d9ceab72516fa7cbcf46f0506f
e56b90588bb9fcc0ee98db85bc20a47cfde87da079bb2a2ae4b14a32339942ea
a132a9a1aadbf70a124e1b2214e41e28b1be5075c1768ea733e5e2ab8bc85769
6febdf27633c88fb46ba07b3cc5fb256df88fe79af5232184ab50cef4831aca8
b39c15514664acba77a7ce63e7c6640e0f532aef6ceb18c931be0be601f10ff8
7aa45d7252507f8f15613162dbf363ab804d1b7b8dc330eda4f0d3d9ffcc32e3
4ae4dd8dfe601fad097f19f2242c0949093928ef18444564e5272d85acbf7831
6c94497b79dab4f88ce1d7af7b85420566434cebcb9fc51dd83327482bf1ec43
41870fd6cdc3fdfaf610c4b132134ae1eb21c1ac472317883f832707dc0cd037
8cecb8e4091b64683d0acd567d6c115ff10defe5be7dec0d13c06d39119bd08f
367547f151358c3ff872bda0017ed0871842b946c7b61da5e4d91f48176a617d
abd8af58913feaa1a372ba3b08ea9d1fbf61816e49e8ad6a6e89fc34d27e7dfd
6c83a329f66deea6773bc3c83bdae22e93a5c80c829528de7d49d70208f7813c
c4bf178d7a0791c8373c1d35d36e05c685bc0928337864f6947dbac9c3cef203
b93bbdd37216d44226173a972b415a6eea96a88ba1949f8c9e6308261994b388
9960414632c92785b3d865af11dd16efb3f2aca55f403cba23882d0174d556e3
1a464927cfc31e1bf09783b9b7470e95fe522b490f913a21660f4d782f1881f2
3b0c506e0f2c0ae7820d1be408ede6d08a9851eb2218ac957dff75166f0a103f
0495bebd383dee37b558293667432be9fb411379d76abac00eeb636d5e5e9a2a
0c0530da094fa8ee4ab8149ddb4d038dc0361877bb110981968b03bf05fca764
e0098144277792b1a9385af30ec70c19545cd232bc089ae8171edb52f887c549
f86471bd3e88462ddda96c38650d6b53277135389b44d992d7c4b293ad5609af
65ce23bcc7db4494cd21a103c4218d5216fe7213707fe1445cff5cd915abe5ab
2a803e67417e5b6f6b724220eb2861391a855f138a124e356895b96016b00b22
f148d687af45fc320574e2ebbe1ffb25ab7008c5aca43257516895861d38099e
a88e416b98b6a8ba28d0e6da96bb39295b2ccfda28d2246bebc59c77437685e4
c5f9cebf083f1fb38f1fc20518b512240ca37e3a1eed93badf983a0500705da5
7d336f0fef2bd1f7e98039ad2a69cc0a0bc543d9741dc8a81577cc3b57274b46
90b71d61f4a10dc3b34e6f9650fd212feec34c970a7c21802459a0e828680544
9519b0828733770b2ca1376962922a1df9e10f6218326a6af4b91c6c89bce157
0af2001aa3c97d32567d9990785dfefb4de54aae50fbf9af3f1c5953c6edd6d7
3a2c4b451291a80062c66a86d1ea77547140e5a7186cd6edaa323a0f3d87d239
c887c89347980dfc6b18d19d96786a2cfd065a98ad349d6fcf918ec9159a7ddf
516457828ad97206e58b8dffce02cce303afe01abf0cbaaadf96d829d842191f
64faef3e3df9e464a01a37c19d578b88ecb96c51312800b21f46e3e58e36eb7a
982f73491d8ffa1c27d7c4375d50e0212e02e02751e3186a92256abbff98c3ed
86dcc1251df159ce58628b2b9a852c66199448f545084de1821a719c88120d38
6f3c5d03b0c58b763fc23a14e7ad1a523b46ca7453c6c8e0c3959272928a0323
25d4eb72174ac2279a0aaaecb84648328895f7b5faaa413fd6b7db5061eb963c
876df82e3dfdca50b72f3fb28b71088101c42c86762696a7115973e0cac2d287
3ca77800a2caaeb8050c3e3a9f602ed44fe27371b582ddcd0f0bea4aab35797c
b85bca0c26dee077b066716faee735a8471951a7cba6379fdc7e7b38ac2a2701
e218d13461ebededc13b1c3fdc9edeaaf26a480249152ce3fee91cfcddd2d18c
492e1abae4d41fd508abc6043f8503006473b32d1e640055994024fa2c9f39e7
a7de24f17ffd26828baa68f2541533b333c15345ab9841abf97c7f720abea24f
0cf6492f1faa00794438ed5f587fd82c48b1a0d3b7233d9dbdb654ff4fd95654
788bdf912ff4bfe3998716ca5421e61619b317010d1a57cfad3064254fd9ae43
579d7b3a8b0222a75035e071daabcde10dc06b7494dc57b1603117e6f9ff59f5
bfe623fa10ee472964ad73df43ed8d4ae3a6df2e13bbbe29f5a9f0a99c87944e
-- Fidelis Threat Research Team