Most of the apps on my phone aren’t obfuscated

I think it’s important for any app that deals with sensitive user data to incorporate code obfuscation into their security.  While far from impenetrable, it’s a useful layer in thwarting reverse-engineers from understanding your app and using that knowledge against you.  If you’ve wondered why I seem to be on a code obfuscation kick recently, it’s because I’ve noticed, anecdotally, that a lot of apps I expected to be using obfuscation, weren’t.  So I set out to see if I could do some research and turn that anecdote into data.

I took all of the apps I have on my phone right now, and calculated how well the code was obfuscated (see methodology notes at the end of the post if you’re curious).  Here are the points that jumped out at me:

  • Most of the apps on my phone aren’t obfuscated. 53% of apps fall in the 0-20% obfuscated bucket, which means they probably didn’t have ProGuard turned on at all.
  • Most of the remaining apps are poorly-to-medium obfuscated. The next 35% of apps are 20-60% obfuscated, which means they probably put some effort into obfuscating the code, but weak configurations (like overusing the -keep directive) have kept much of their code un-obfuscated.
  • A small portion of apps are well obfuscated. Just 12% of apps are in the 60+% obfuscated range, where most of their code is very difficult to follow.

Why would an app with ProGuard turned off have a score greater than 0? A small part of that would be due to false positives (see the “Methodology” section, below) but most is due to third-party library code that has its internal implementation details obfuscated, before that code is even packaged into an app.

Methodology

    • Corpus – As alluded to in the post, I ran this analysis on the apps that I have installed on my personal phone.  This group of apps probably isn’t a perfectly representative sample of apps on the app store.  It is, however, a representative sample of apps that I actually care about 🙂
    • Tools – to calculate the “% obfuscated” metric, I used the `apkanalyzer` tool included in the Android SDK.  Looking through the classes, methods, and fields, I counted the number that appear to have been obfuscated (see next point), and added them all up for a ratio.
    • Metric – deciding whether something has been obfuscated isn’t straightforward; different obfuscation methods will yield different results.  But I know ProGuard is a common tool for Android obfuscation, and it prefers to transform names to single-character names, so I checked for single-character names.  This has the potential for false positives (e.g. x, y, z in a graphics data class would register as obfuscated even though those are probably the original names) and false negatives (any other obfuscation method might choose different names which aren’t single-character, which this analysis would miss).

 

ProGuard pro-tip: Don’t use ProGuard’s “-include” with Gradle builds

I like to organize my ProGuard rules by breaking them up into separate files, grouped by what they’re acting on — my data model, third-party libs, etc.

It’s tempting to reference one main ProGuard file from your build.gradle, and then put ‘-include’ entries in the main file to point to the rest, but there’s a problem with this. If you make a change in one of the included files, it won’t be picked up until you do a clean build.

The reason for this is Gradle’s model of inputs and outputs for tasks.  Gradle is really smart about avoiding unnecessary work, so if none of the inputs for a task have changed, then Gradle won’t rerun the task.  If you list one main ProGuard file in your build.gradle and include the rest from there, Gradle only sees the main file as an input.  So if you make changes in an included file, Gradle doesn’t think anything changed.

The easy way I’ve found to work around this is to put all of my ProGuard rules into a directory, and include them all in your build.gradle with this snippet:

Distinguishing between the different ProGuard “-keep” directives

If you search for ProGuard rules for a Java or Android library, you’ll see a lot of answers on StackOverflow that tell you to do something like this:

-keep class com.foo.library.** { *; } 

That advice is really bad, and you should never do it.  First, it’s overly broad — that double-asterisk in the package means every class under every package under that top-level package; and the asterisk inside the curly braces applies to every member (variables, methods, and constants) inside those class.  That is, it applies to all code in the library.  If you use that rule, Jake Wharton is going to come yell at you:

Second, and what this post is about, is the beginning of the directive, that “-keep”.  You almost never want to use -keep; if you do need a ProGuard rule, you usually want one of the more specific variants below.  But it always takes me a minute with the ProGuard manual to figure out which one of those variants applies to my case, so I made some tables for quick visual reference.  (Quick aside: the ProGuard manual is very useful and I highly recommend you look through it.)

No rule

To get our bearings, let’s look at the default.  If you don’t specify a keep directive of any kind, then ProGuard is going to do it’s normal thing — it’s going to both shrink (i.e. remove unused code) and obfuscate (i.e. rename things) both classes and class members.

-keep

See, this is why I said you should almost never use -keep.  -keep disables all of ProGuard’s goodness.  No shrinking, no obfuscation; not for classes, not for members.  In real use cases, you can let ProGuard do at least some of it’s work.  Even if your variables are accessed by reflection, you could remove and rename unused classes, for example.  So let’s look through the more specific -keep variants.

-keepclassmembers

This protects only the members of the class from shrinking and obfuscation.  That is, if a class is unused, it will be removed.  If the class is used, the class will be kept but renamed.  But inside any class that is kept around, all of its members will be there, and they will have their original names.

-keepnames

This allows shrinking for classes and members, but not obfuscation.  That is, any unused code is going to get removed.  But the code that is kept will keep its original names.

-keepclassmembernames

This is the most permissive keep directive; it lets ProGuard do almost all of its work.  Unused classes are removed, the remaining classes are renamed, unused members of those classes are removed, but then the remaining members keep their original names.

-keepclasseswithmembers

This one doesn’t get a table, because it’s the same as -keep.  The difference is that it only applies to classes who have all of the members in the class specification.

-keepclasseswithmembernames

Similarly, this rule is the same as -keepnames.  The difference, again, is that it only applies to classes who have all of the members in the class specification.

Conclusion

You want to let ProGuard do as much work as possible, so pick the directive that has the fewest red X blocks above, while still meeting your need.