Feeding ProGuard’s inputs: where are all of these rules coming from?

When ProGuard runs on your Android app, it’s not only using the rules that you’ve explicitly added, but rules from a couple of other sources as well.  Below are four sources to watch out for:

Rules explicitly added by the developer

These are the rules in a file that you add with `proguardFile` in your build.gradle.  If you’ve used ProGuard at all, you’re probably accustomed to adding configuration rules here.

build.gradle snippet showing the proguardFiles method

Default ProGuard file

A base set of widely-applicable rules get added by that “getDefaultProguardFile” line above, which you’ll see in the template build.gradle file.  It protects things like Parcelables, view getters and setters (to enable animations on view properties) and anything with the @Keep annotation.  You should walk through this file to see what ProGuard rules get added by default.  It’s also a great source of example rules when you’re trying to make your own.

snippet of example rules from the default Android ProGuard file

AAPT generated rules

When AAPT processes the resources in your app, it generates rules to protect, for example, View classes that are referenced from layout files, and Activities referenced in your manifest. Each rule comes with one or more comments saying where the class was referenced, so you can track down the offending resource if need be.  You can find these rules in:

build/intermediates/proguard-rules/{buildType}/aapt_rules.txt.
snippet of example ProGuard rules generated by AAPT
snippet of example ProGuard rules generated by AAPT

Note that this source isn’t explicitly added in your build.gradle, the Android Gradle Plugin takes care of including these rules for you.

Rules from libraries

If an Android library has ProGuard configs that apps should use when they include that library, the library can declare those rules with the consumerProguardFile method in its build.gradle.  For example, here’s a snippet from Butterknife’s build.gradle.

snippet of Butterknife’s build.gradle showing the consumerProguardFiles method

 

Bringing it all together

Rather than trying to track down every single source of ProGuard config that’s getting added to your app, you can look at them all together.  You can specify

-printconfiguration proguard-config.txt

in your rules file and ProGuard will print out all of the configuration rules to the specified file.  Note that they’re not in the same order that they were originally specified, so it can be hard to figure out where a particular rule is coming from.  

Reading ProGuard’s Outputs

When ProGuard processes an Android app, it generates a few output files to show what happened in each step. These files can be really helpful if you’re trying to figure out what ProGuard changed, or why your code ended up the way it did. But those files aren’t self-documenting, so I’m going to walk you through why each of those files is created and what it shows you.

These files are in the build directory, something like:

app/build/outputs/mapping/{buildType}/

Here’s a diagram I made with a really high-level overview of the steps ProGuard takes when it’s analyzing your app, because the output files line up nicely with these steps.  Refer back to this for some context around the following steps.

seeds.txt

The first things ProGuard does is to read all of your configuration files, and then read in all of the Java bytecode (.class files) to create what it calls the class pool.  ProGuard then looks through the class pool and prints to seeds.txt a list of every class and member that matches any of your keep rules.  This is useful for debugging if the keep rule you wrote actually matches the class you’re trying to keep.

If it’s a class that matches, there will be a line with just the fully-qualified name of the class.  For a member, it will be the fully-qualified class name, followed by a colon, followed by the member’s signature.

usage.txt

Knowing what code it has to keep, ProGuard then goes through the class pool and finds code that it doesn’t need to keep.  This is the shrinking phase, where ProGuard strips out unused code from the app.  As it’s doing this, it prints out unused code — code that’s being removed — to usage.txt.  Now this name seems backwards to me; I think it should be unused.txt or shrinkage.txt or something, but that’s just me.  

This is useful if you’re trying to figure out why a class doesn’t exist at runtime.  You can check whether it got removed here, or got renamed in the next step.

If an entire class is removed, you’ll get a line with the fully-qualified class name.  If only certain members of a class are removed, you get the class name followed by a colon, and then a line (indented with four spaces) for each member that was removed.

mapping.txt

The next thing ProGuard needs to do is obfuscate as much code as possible — that is, it’s going to rename classes and members to meaningless names like “a”, “b”, etc.  As it’s doing this, ProGuard prints the old name and new name for every class and member to mapping.txt.  Not all code is renamed, but all code is listed in mapping.txt.

This is the file you need if you’re trying to de-obfuscate a stacktrace.  It allows you to work backwards from the obfuscated names to the original names of code.

Each line is of the form “{old name} -> {new name}”.  You get a line for the class name, then a line for each member of the class.  Note that constructors are shown as “<init>()”.  

dump.txt

After ProGuard has done all of its magic (shrinking and obfuscating), it prints out one last file which is essentially a full listing of all the code after processing.  That is, everything that’s left in the class files, but in a less optimized format, so it’s a huge file.  I have a demo app that I use for testing ProGuard stuff, and the final app is about 1 MB, but the dump.txt is almost 18 MB.  It’s enormous. Here’s the output for a trivial class:

This can be really useful, though, if you want to see what’s in your class files but don’t want to decompile the .class or .dex files.

Archiving

One last note is that these files are important artifacts of your build — especially mapping.txt.  If this is a build you’re going to distribute (say on the Play Store, or even internally for testing), you’re going to need your mapping.txt to de-obfuscate stacktraces.  

Reverse-engineering the fake Haven apps

As happens with successful and interesting apps, somebody made impostor copies of the Guardian Project’s Haven app, which got a lot of press due in part to Edward Snowden’s involvement.  As of this writing, the copycat has a dozen listings on the Play Store, all with little variations in the name.  (Hat tip to @rettiwtkrow, whose tweet was the first I saw of it).

Interestingly, the fake apps have a slightly different icon from the real app.  Below is the real app’s icon on the left, and the fake app on the right.

I downloaded half a dozen of these copycats and started reverse-engineering them.  The copycats are all the same as each other, only the package name changes.  But they’re all completely different from the real app.  I wondered at first if they might have ripped off the real app’s code since it’s open source, but no.

When you open the fake, it’s immediately clear that they’re just using the name to drive downloads, and the functionality of the app isn’t even trying to look real.  It’s a run-of-the-mill crapware “cleaner” app.

It has a tab bar across the bottom, and each tab is a different tool: “Charge Booster”, “Battery Saver”, “CPU Cooler”, and “Junk Cleaner”.    They show some made-up stats about the device, and make promises they can’t deliver.  For the most part, as you run one of these tools, it displays an animation (that isn’t actually tied to any real work being done), and then a full-screen ad.  After one tool, I was presented with a fake Facebook sign-up page.

      

If you try to apply the “Ultra Power Saving Mode” it sends you to the OS settings app to enable “Allow modify system settings”.  I found this alarming, and assumed this is where I would find the nefarious code.  So I decompiled the app and poked around, and was honestly pretty underwhelmed.

I should note that the impostor app’s targetSdk is 24, meaning it has to ask for permissions at runtime, which is nice and a little surprising, given that Google isn’t yet forcing developers to do so.

Anyway, I need not have worried – after you grant this app the ability to modify system settings, all it does is turn down your screen brightness, disable autorotation, and disable background syncing for other apps.

(such battery savings. many mAh. very wow)

Then I got a little worried again when I saw the app has a BroadcastReceiver that runs immediately when the app is installed or updated.  This means the app has code that runs in the background even if you never open the app.  But again, I shouldn’t have worried because all it does is try to show a toast message saying “[app] Is Optimized by Fast Cleaner & Battery Saver.”

The bottom line is that these copycats appear to be using the Haven name and publicity just to drive downloads, then make a quick buck off of advertising.  I couldn’t find any evidence of anything more sinister than that.

For my last point, let’s talk about attribution. I can’t say for sure, but I have an educated guess where this came from.  There’s an app which is almost entirely the same code on the Google Play store.  Unlike the Haven copycats, this app has accurate screenshots on the Play Store, and the UI matches the Haven impostors.  Either this is the same developer, or it’s a false flag.  I can’t rule that out, so I’m not saying this with 100% confidence, but a preponderance of the evidence surely doesn’t look good for that developer.

I assume Google will be along shortly to remove the offending copycats (and hopefully terminate their ad accounts, and the rest of their apps under both publisher listings).  But it’s an important reminder that copycats exist, and it’s important to remain vigilant.

Those Android Versions Dashboards are Overrated

The Android Developers website has a chart showing how many Android devices are on each version of the OS.  You’ve probably seen it before.

Each month that chart is updated, and Android-focused blogs expend a lot of words analyzing the changes. [1 2 3]  Broader tech blogs use it to decry Android’s “fragmentation” problem. [1 2 3 4 5]

But I’m here to let you in on a little secret: that chart doesn’t mean what you think it means, and it’s not all that useful.  That chart represents all Android devices, all over the world, that have connected to the Google Play Store in the last seven days.  

To show why that group of devices isn’t the data we want, let’s compare that to data from one of my personal apps.  And because I’m a nice person, I’m not going to subject you to another pie chart.

For any Android version on the x-axis, the height represents the proportion of devices that are running that version or newer.  These are the numbers you need to make the important decision of “what is the lowest version of Android that I’m going to support?”  For one example, let’s look at Android 7.0.  The Google Play numbers would tell you that only 23.8% of devices are on Android 7.0 or newer, but for my app, that’s 67.8% — a difference of 44%.  That’s a night and day difference.

Why so different?

There are a lot of reasons that it’s different, but I’ll cover a few:

  • All over the world.  The Google Play numbers encompass Android devices all over the world.  When you see numbers broken down by country, there are big differences.  In most cases, your target market is not the whole world.
  • Not just phones. Google’s numbers include phones, tablets, and anything else running Android.  There are a lot of weird things running Android.  You don’t need to support somebody’s fridge.  For a lot of apps, you shouldn’t even care about tablets.
  • Not just primary phones.  I personally have at least eight devices that would show up in Google’s chart (i.e. I’ve powered them on in the last seven days), including an old tablet running Android 4.4 that I keep around for testing.  But if you’re trying to target me as a user, you don’t need to care about the oldest device on my desk, you just need to support my primary phone – which is running Android 8.0.

Takeaway

If you’re trying to figure out what’s the lowest version of Android that you’re going to support for an existing app, make sure you’re looking at data from your app.  I guarantee you it’s going to be different than the public Google Play numbers.  And if you’re starting a new app, try to find a source of data that more closely matches your target market.

Most of the apps on my phone aren’t obfuscated

I think it’s important for any app that deals with sensitive user data to incorporate code obfuscation into their security.  While far from impenetrable, it’s a useful layer in thwarting reverse-engineers from understanding your app and using that knowledge against you.  If you’ve wondered why I seem to be on a code obfuscation kick recently, it’s because I’ve noticed, anecdotally, that a lot of apps I expected to be using obfuscation, weren’t.  So I set out to see if I could do some research and turn that anecdote into data.

I took all of the apps I have on my phone right now, and calculated how well the code was obfuscated (see methodology notes at the end of the post if you’re curious).  Here are the points that jumped out at me:

  • Most of the apps on my phone aren’t obfuscated. 53% of apps fall in the 0-20% obfuscated bucket, which means they probably didn’t have ProGuard turned on at all.
  • Most of the remaining apps are poorly-to-medium obfuscated. The next 35% of apps are 20-60% obfuscated, which means they probably put some effort into obfuscating the code, but weak configurations (like overusing the -keep directive) have kept much of their code un-obfuscated.
  • A small portion of apps are well obfuscated. Just 12% of apps are in the 60+% obfuscated range, where most of their code is very difficult to follow.

Why would an app with ProGuard turned off have a score greater than 0? A small part of that would be due to false positives (see the “Methodology” section, below) but most is due to third-party library code that has its internal implementation details obfuscated, before that code is even packaged into an app.

Methodology

    • Corpus – As alluded to in the post, I ran this analysis on the apps that I have installed on my personal phone.  This group of apps probably isn’t a perfectly representative sample of apps on the app store.  It is, however, a representative sample of apps that I actually care about 🙂
    • Tools – to calculate the “% obfuscated” metric, I used the `apkanalyzer` tool included in the Android SDK.  Looking through the classes, methods, and fields, I counted the number that appear to have been obfuscated (see next point), and added them all up for a ratio.
    • Metric – deciding whether something has been obfuscated isn’t straightforward; different obfuscation methods will yield different results.  But I know ProGuard is a common tool for Android obfuscation, and it prefers to transform names to single-character names, so I checked for single-character names.  This has the potential for false positives (e.g. x, y, z in a graphics data class would register as obfuscated even though those are probably the original names) and false negatives (any other obfuscation method might choose different names which aren’t single-character, which this analysis would miss).

 

ProGuard pro-tip: Don’t use ProGuard’s “-include” with Gradle builds

I like to organize my ProGuard rules by breaking them up into separate files, grouped by what they’re acting on — my data model, third-party libs, etc.

It’s tempting to reference one main ProGuard file from your build.gradle, and then put ‘-include’ entries in the main file to point to the rest, but there’s a problem with this. If you make a change in one of the included files, it won’t be picked up until you do a clean build.

The reason for this is Gradle’s model of inputs and outputs for tasks.  Gradle is really smart about avoiding unnecessary work, so if none of the inputs for a task have changed, then Gradle won’t rerun the task.  If you list one main ProGuard file in your build.gradle and include the rest from there, Gradle only sees the main file as an input.  So if you make changes in an included file, Gradle doesn’t think anything changed.

The easy way I’ve found to work around this is to put all of my ProGuard rules into a directory, and include them all in your build.gradle with this snippet:

Distinguishing between the different ProGuard “-keep” directives

If you search for ProGuard rules for a Java or Android library, you’ll see a lot of answers on StackOverflow that tell you to do something like this:

-keep class com.foo.library.** { *; } 

That advice is really bad, and you should never do it.  First, it’s overly broad — that double-asterisk in the package means every class under every package under that top-level package; and the asterisk inside the curly braces applies to every member (variables, methods, and constants) inside those class.  That is, it applies to all code in the library.  If you use that rule, Jake Wharton is going to come yell at you:

Second, and what this post is about, is the beginning of the directive, that “-keep”.  You almost never want to use -keep; if you do need a ProGuard rule, you usually want one of the more specific variants below.  But it always takes me a minute with the ProGuard manual to figure out which one of those variants applies to my case, so I made some tables for quick visual reference.  (Quick aside: the ProGuard manual is very useful and I highly recommend you look through it.)

No rule

To get our bearings, let’s look at the default.  If you don’t specify a keep directive of any kind, then ProGuard is going to do it’s normal thing — it’s going to both shrink (i.e. remove unused code) and obfuscate (i.e. rename things) both classes and class members.

-keep

See, this is why I said you should almost never use -keep.  -keep disables all of ProGuard’s goodness.  No shrinking, no obfuscation; not for classes, not for members.  In real use cases, you can let ProGuard do at least some of it’s work.  Even if your variables are accessed by reflection, you could remove and rename unused classes, for example.  So let’s look through the more specific -keep variants.

-keepclassmembers

This protects only the members of the class from shrinking and obfuscation.  That is, if a class is unused, it will be removed.  If the class is used, the class will be kept but renamed.  But inside any class that is kept around, all of its members will be there, and they will have their original names.

-keepnames

This allows shrinking for classes and members, but not obfuscation.  That is, any unused code is going to get removed.  But the code that is kept will keep its original names.

-keepclassmembernames

This is the most permissive keep directive; it lets ProGuard do almost all of its work.  Unused classes are removed, the remaining classes are renamed, unused members of those classes are removed, but then the remaining members keep their original names.

-keepclasseswithmembers

This one doesn’t get a table, because it’s the same as -keep.  The difference is that it only applies to classes who have all of the members in the class specification.

-keepclasseswithmembernames

Similarly, this rule is the same as -keepnames.  The difference, again, is that it only applies to classes who have all of the members in the class specification.

Conclusion

You want to let ProGuard do as much work as possible, so pick the directive that has the fewest red X blocks above, while still meeting your need.

Getting Started with Android Things

Android ThingsI’m giving a presentation tonight at the Charlotte Android Developer / GDG meetup about Getting Started with Android Things.

Update: here’s the video!

Here are the slides.

And some relevant links:

An Android runtime permissions corner case

I recently got curious about what happens to the runtime permissions state when the user revokes or grants a permission using the OS Settings app, rather than allowing or denying in the dialog prompt that an app can show to request permission.

For background, there are two parts to the state that an app can query at any time:

The question I got curious about was this: what happens when the user revokes a permission in the settings app?  If they had previously chosen “don’t ask again”, but then later grant the permission in settings, then revoke the permission again, can you prompt for the permission?

I made a simple app to try it out, and came up with this for the possible states:The answer is that if the user revokes a permission in settings, all of that history essentially goes away.  You’re left in the same state as if you had asked for permission and the user had denied it once, but you can ask again.

I realize this isn’t the sort of thing that’s going to happen in the course of normal usage; I don’t expect most users to ever toggle those permissions from the settings app.  But I think it’s worth understanding how it works.

How (often) Does a Bill Become a Law?

[tl;dr — about a quarter of the time]

Every year as state legislative sessions get started, there’s a flurry of scary headlines of the form: “[State] [Party] files bill to do [thing I find scary, bigoted, or irresponsible]”.  It’s tough to know from reading whether this is something you should get concerned about.  

Maybe you should, maybe this is the next big bill that will make your state a national embarrassment, spark economic boycotts, and cause your Governor to lose his re-election campaign.  Or maybe this is a crackpot bill and it’s not going anywhere, so don’t waste your time.  

Good reporting can help to fill this in – is this a senior legislator, are they a known kook, etc.  But there’s another, more general piece of this context that I wanted to help fill in.  My question is, overall: how often does a bill go from being filed (its first official step) to becoming law (its final official step)?

First you need at least a rough understanding of how a legislature works.  You could watch the classic Schoolhouse Rock video:

Or I’ll give you the 60-second version (skip to the next paragraph if you already know how it works): a legislator files a bill, which is a thing that they want to become law.  The person who files is called the “sponsor” or “patron.”  It’s referred to a relevant committee, which is made up of a subset of the legislators in their chamber (the House or Senate, usually).  If a majority of the members of the committee vote for it, then it goes to the full chamber, if it gets a majority vote there then it’s handed over to the other chamber, and the committee and full-chamber process repeats.  After that, the governor signs the bill then it becomes a law.  (Exact details vary by state, but it’s probably pretty close to that, unless you’re in Nebraska.  Find your legislature’s website and they probably have a “how a bill becomes a law” page to explain their version of the process.)

The important thing to know is that bills can (and do) fail at any of those steps in the previous paragraph.  Depending on the rules of the chamber, if the leadership doesn’t like the sponsor, or the bill, it may never even come up for a vote.  Or those votes will happen in a closed session of a subcommittee, where the public doesn’t get to see which legislators voted to kill a bill.

Back to the point, I downloaded the history of every bill filed in the North Carolina General Assembly (NCGA), my state legislature, since 1985 and parsed out its fate.  The answer to my question was 23.6% — just under a quarter of our bills actually become law.  

So when you see that scary headline, keep in mind that a bill being filed is actually more likely to fail without becoming law, than to succeed.  I’m not telling you to ignore it, or not to fight it — on the contrary, this should encourage you to fight.  Be part of the reason that the bill you don’t like failed.  Get out there and kill that bill.

I’m sorry, I had to.  I don’t even like this movie, but the pun was just sitting right there.

Since I’ve got all this data, I decided to answer another question quickly: how often does the governor veto bills?  This next chart shows how often, when given the option to sign or veto a bill, the governor chose to veto:

So, uh, not often.  There’s a tiny little spike in the 2011-2012 session, when we had a Democratic governor and a Republican-controlled legislature, but even then it barely registers on the chart.

If you got this far and you think this was at least a little neat, I want to do something for you.  I’ve got all this data downloaded and parsed, and I’m sifting through it, but I want to know what questions I should try to answer.  Here are a few that I want to tackle next:

  • a breakdown of where bills fail – in committee, in the full chamber, in the other chamber.
  • which legislators get more of their bills passed?
  • are bills with more co-sponsors more likely to pass?

Can you come up with others?  If you have a question or a hypothesis, tweet it at me @jebstuart and I’ll see if I can answer it.

Following are a few notes about methodology & assumptions:

Why start in 1985? That’s as far back as the easily-available online records go (and almost my full lifetime) so I’m calling that “the relevant dataset”.  I feel like it’s enough data to get a good sense of the overall pattern.  

Why are there only data points every other year?  The NCGA operates in two-year sessions (e.g. we’re currently in the 2017-2018 session), and that’s how their bills are filed.

What about resolutions?  An NC legislator can file a “bill” or a “resolution”, but resolutions are typically honorary things without legal weight, so I excluded them from the dataset as not relevant to the question I wanted to answer.

What about “extra” (or “special”) sessions?  I didn’t include them either.  They’re typically single-purpose sessions with few bills filed, and I didn’t want to clutter the data.

Are the numbers the same for [other state]? Not the exact same, for sure.  Maybe similar, maybe not.  I have a hunch these results are probably “typical”, within a standard deviation or so, but I can’t verify that without a lot of work that I’m not going to do.