Most of the apps on my phone aren’t obfuscated

I think it’s important for any app that deals with sensitive user data to incorporate code obfuscation into their security.  While far from impenetrable, it’s a useful layer in thwarting reverse-engineers from understanding your app and using that knowledge against you.  If you’ve wondered why I seem to be on a code obfuscation kick recently, it’s because I’ve noticed, anecdotally, that a lot of apps I expected to be using obfuscation, weren’t.  So I set out to see if I could do some research and turn that anecdote into data.

I took all of the apps I have on my phone right now, and calculated how well the code was obfuscated (see methodology notes at the end of the post if you’re curious).  Here are the points that jumped out at me:

  • Most of the apps on my phone aren’t obfuscated. 53% of apps fall in the 0-20% obfuscated bucket, which means they probably didn’t have ProGuard turned on at all.
  • Most of the remaining apps are poorly-to-medium obfuscated. The next 35% of apps are 20-60% obfuscated, which means they probably put some effort into obfuscating the code, but weak configurations (like overusing the -keep directive) have kept much of their code un-obfuscated.
  • A small portion of apps are well obfuscated. Just 12% of apps are in the 60+% obfuscated range, where most of their code is very difficult to follow.

Why would an app with ProGuard turned off have a score greater than 0? A small part of that would be due to false positives (see the “Methodology” section, below) but most is due to third-party library code that has its internal implementation details obfuscated, before that code is even packaged into an app.

Methodology

    • Corpus – As alluded to in the post, I ran this analysis on the apps that I have installed on my personal phone.  This group of apps probably isn’t a perfectly representative sample of apps on the app store.  It is, however, a representative sample of apps that I actually care about 🙂
    • Tools – to calculate the “% obfuscated” metric, I used the `apkanalyzer` tool included in the Android SDK.  Looking through the classes, methods, and fields, I counted the number that appear to have been obfuscated (see next point), and added them all up for a ratio.
    • Metric – deciding whether something has been obfuscated isn’t straightforward; different obfuscation methods will yield different results.  But I know ProGuard is a common tool for Android obfuscation, and it prefers to transform names to single-character names, so I checked for single-character names.  This has the potential for false positives (e.g. x, y, z in a graphics data class would register as obfuscated even though those are probably the original names) and false negatives (any other obfuscation method might choose different names which aren’t single-character, which this analysis would miss).

 

ProGuard pro-tip: Don’t use ProGuard’s “-include” with Gradle builds

I like to organize my ProGuard rules by breaking them up into separate files, grouped by what they’re acting on — my data model, third-party libs, etc.

It’s tempting to reference one main ProGuard file from your build.gradle, and then put ‘-include’ entries in the main file to point to the rest, but there’s a problem with this. If you make a change in one of the included files, it won’t be picked up until you do a clean build.

The reason for this is Gradle’s model of inputs and outputs for tasks.  Gradle is really smart about avoiding unnecessary work, so if none of the inputs for a task have changed, then Gradle won’t rerun the task.  If you list one main ProGuard file in your build.gradle and include the rest from there, Gradle only sees the main file as an input.  So if you make changes in an included file, Gradle doesn’t think anything changed.

The easy way I’ve found to work around this is to put all of my ProGuard rules into a directory, and include them all in your build.gradle with this snippet:

Distinguishing between the different ProGuard “-keep” directives

If you search for ProGuard rules for a Java or Android library, you’ll see a lot of answers on StackOverflow that tell you to do something like this:

-keep class com.foo.library.** { *; } 

That advice is really bad, and you should never do it.  First, it’s overly broad — that double-asterisk in the package means every class under every package under that top-level package; and the asterisk inside the curly braces applies to every member (variables, methods, and constants) inside those class.  That is, it applies to all code in the library.  If you use that rule, Jake Wharton is going to come yell at you:

Second, and what this post is about, is the beginning of the directive, that “-keep”.  You almost never want to use -keep; if you do need a ProGuard rule, you usually want one of the more specific variants below.  But it always takes me a minute with the ProGuard manual to figure out which one of those variants applies to my case, so I made some tables for quick visual reference.  (Quick aside: the ProGuard manual is very useful and I highly recommend you look through it.)

No rule

To get our bearings, let’s look at the default.  If you don’t specify a keep directive of any kind, then ProGuard is going to do it’s normal thing — it’s going to both shrink (i.e. remove unused code) and obfuscate (i.e. rename things) both classes and class members.

-keep

See, this is why I said you should almost never use -keep.  -keep disables all of ProGuard’s goodness.  No shrinking, no obfuscation; not for classes, not for members.  In real use cases, you can let ProGuard do at least some of it’s work.  Even if your variables are accessed by reflection, you could remove and rename unused classes, for example.  So let’s look through the more specific -keep variants.

-keepclassmembers

This protects only the members of the class from shrinking and obfuscation.  That is, if a class is unused, it will be removed.  If the class is used, the class will be kept but renamed.  But inside any class that is kept around, all of its members will be there, and they will have their original names.

-keepnames

This allows shrinking for classes and members, but not obfuscation.  That is, any unused code is going to get removed.  But the code that is kept will keep its original names.

-keepclassmembernames

This is the most permissive keep directive; it lets ProGuard do almost all of its work.  Unused classes are removed, the remaining classes are renamed, unused members of those classes are removed, but then the remaining members keep their original names.

-keepclasseswithmembers

This one doesn’t get a table, because it’s the same as -keep.  The difference is that it only applies to classes who have all of the members in the class specification.

-keepclasseswithmembernames

Similarly, this rule is the same as -keepnames.  The difference, again, is that it only applies to classes who have all of the members in the class specification.

Conclusion

You want to let ProGuard do as much work as possible, so pick the directive that has the fewest red X blocks above, while still meeting your need.

Getting Started with Android Things

Android ThingsI’m giving a presentation tonight at the Charlotte Android Developer / GDG meetup about Getting Started with Android Things.

Update: here’s the video!

Here are the slides.

And some relevant links:

An Android runtime permissions corner case

I recently got curious about what happens to the runtime permissions state when the user revokes or grants a permission using the OS Settings app, rather than allowing or denying in the dialog prompt that an app can show to request permission.

For background, there are two parts to the state that an app can query at any time:

The question I got curious about was this: what happens when the user revokes a permission in the settings app?  If they had previously chosen “don’t ask again”, but then later grant the permission in settings, then revoke the permission again, can you prompt for the permission?

I made a simple app to try it out, and came up with this for the possible states:The answer is that if the user revokes a permission in settings, all of that history essentially goes away.  You’re left in the same state as if you had asked for permission and the user had denied it once, but you can ask again.

I realize this isn’t the sort of thing that’s going to happen in the course of normal usage; I don’t expect most users to ever toggle those permissions from the settings app.  But I think it’s worth understanding how it works.

How (often) Does a Bill Become a Law?

[tl;dr — about a quarter of the time]

Every year as state legislative sessions get started, there’s a flurry of scary headlines of the form: “[State] [Party] files bill to do [thing I find scary, bigoted, or irresponsible]”.  It’s tough to know from reading whether this is something you should get concerned about.  

Maybe you should, maybe this is the next big bill that will make your state a national embarrassment, spark economic boycotts, and cause your Governor to lose his re-election campaign.  Or maybe this is a crackpot bill and it’s not going anywhere, so don’t waste your time.  

Good reporting can help to fill this in – is this a senior legislator, are they a known kook, etc.  But there’s another, more general piece of this context that I wanted to help fill in.  My question is, overall: how often does a bill go from being filed (its first official step) to becoming law (its final official step)?

First you need at least a rough understanding of how a legislature works.  You could watch the classic Schoolhouse Rock video:

Or I’ll give you the 60-second version (skip to the next paragraph if you already know how it works): a legislator files a bill, which is a thing that they want to become law.  The person who files is called the “sponsor” or “patron.”  It’s referred to a relevant committee, which is made up of a subset of the legislators in their chamber (the House or Senate, usually).  If a majority of the members of the committee vote for it, then it goes to the full chamber, if it gets a majority vote there then it’s handed over to the other chamber, and the committee and full-chamber process repeats.  After that, the governor signs the bill then it becomes a law.  (Exact details vary by state, but it’s probably pretty close to that, unless you’re in Nebraska.  Find your legislature’s website and they probably have a “how a bill becomes a law” page to explain their version of the process.)

The important thing to know is that bills can (and do) fail at any of those steps in the previous paragraph.  Depending on the rules of the chamber, if the leadership doesn’t like the sponsor, or the bill, it may never even come up for a vote.  Or those votes will happen in a closed session of a subcommittee, where the public doesn’t get to see which legislators voted to kill a bill.

Back to the point, I downloaded the history of every bill filed in the North Carolina General Assembly (NCGA), my state legislature, since 1985 and parsed out its fate.  The answer to my question was 23.6% — just under a quarter of our bills actually become law.  

So when you see that scary headline, keep in mind that a bill being filed is actually more likely to fail without becoming law, than to succeed.  I’m not telling you to ignore it, or not to fight it — on the contrary, this should encourage you to fight.  Be part of the reason that the bill you don’t like failed.  Get out there and kill that bill.

I’m sorry, I had to.  I don’t even like this movie, but the pun was just sitting right there.

Since I’ve got all this data, I decided to answer another question quickly: how often does the governor veto bills?  This next chart shows how often, when given the option to sign or veto a bill, the governor chose to veto:

So, uh, not often.  There’s a tiny little spike in the 2011-2012 session, when we had a Democratic governor and a Republican-controlled legislature, but even then it barely registers on the chart.

If you got this far and you think this was at least a little neat, I want to do something for you.  I’ve got all this data downloaded and parsed, and I’m sifting through it, but I want to know what questions I should try to answer.  Here are a few that I want to tackle next:

  • a breakdown of where bills fail – in committee, in the full chamber, in the other chamber.
  • which legislators get more of their bills passed?
  • are bills with more co-sponsors more likely to pass?

Can you come up with others?  If you have a question or a hypothesis, tweet it at me @jebstuart and I’ll see if I can answer it.

Following are a few notes about methodology & assumptions:

Why start in 1985? That’s as far back as the easily-available online records go (and almost my full lifetime) so I’m calling that “the relevant dataset”.  I feel like it’s enough data to get a good sense of the overall pattern.  

Why are there only data points every other year?  The NCGA operates in two-year sessions (e.g. we’re currently in the 2017-2018 session), and that’s how their bills are filed.

What about resolutions?  An NC legislator can file a “bill” or a “resolution”, but resolutions are typically honorary things without legal weight, so I excluded them from the dataset as not relevant to the question I wanted to answer.

What about “extra” (or “special”) sessions?  I didn’t include them either.  They’re typically single-purpose sessions with few bills filed, and I didn’t want to clutter the data.

Are the numbers the same for [other state]? Not the exact same, for sure.  Maybe similar, maybe not.  I have a hunch these results are probably “typical”, within a standard deviation or so, but I can’t verify that without a lot of work that I’m not going to do.

Gerrymandering in NC, or, A Tale of Two States

I moved to North Carolina in 2011 and, as I’ve come to learn more about our politics, I’ve been struck by something that seems impossible: North Carolina has two separate realms of politics.  In statewide elections, NC is split pretty much 50/50 between the two major parties.  But in races where the state is split into districts, like the state legislature, the Republican Party controls a supermajority of seats (over 60%).  With over 60% of the seats in the NC House and Senate, the Republicans can override the Governor’s veto, severely limiting the executive’s power.  

In the first chart are the statewide races – some are won by Democrats, some by Republicans, but they’re all within a couple percent of 50/50.  In the second chart, we see the various districted houses in North Carolina, and all of them have 60+% Republican wins.  I’ve spent much of the last year puzzling over how these two types of elections could produce such different results from the same voters.  My background is in computer science and math, so I’ve tried to find an answer to this question the best way I can: using data.

Our state legisIature, the North Carolina General Assembly (NGCA), is elected every two years, as are our Congressional representatives.  The US President and our statewide offices (Governor, Lieutenant Governor, Attorney General, etc) are every four years, and US Senators are every six years.  So we have a lot of data to work with.  I started by downloading the raw election result data for each of the previously-mentioned races for the last fifteen years from the State Board of Elections.  (Before 2002, the data gets harder to access and less detailed).

What I found is that, before I moved here, elections weren’t quite so lopsided.
Clearly something changed, abruptly, between 2010 and 2012.  I don’t think it was me moving to Charlotte.  More importantly, it wasn’t that the electorate faced a massive shift between parties — if you look at statewide elections, the party splits stayed roughly the same.  There’s some variability as you look at individual races, but you see they’re mostly within the 45-55% range.

So what changed?

These changed.  Following the 2010 census, as happens after each census, district boundaries are redrawn in order to rebalance the shifting population across the districts.  But in 2010, the Republican Party had just won control of the NCGA, so they got to control the redistricting process – the prior maps were drawn by a Democratic Party controlled legislature.

Now it’s important to understand how strategic redistricting works.  If you want to skew the district map to your party’s advantage, you create a small number of districts with as many of your opponent’s voters as you can cram in (“packing”), and a larger number of districts with just enough of your voters to reliably win (“cracking”).  If you’re unfamiliar, I’d recommend taking a minute to read Wikipedia’s primer, which has some helpful examples of why this is so effective.  Also, fun fact, the first district they show as an example of egregious gerrymandering is the district I was in when I first moved to Charlotte, the NC-12th.

In any election, 49% of the votes are “wasted” — that is, they don’t contribute to the election of a candidate.  This includes all of the votes for the losing candidate, and all of the votes for the winner past a simple majority.  If you pack and crack effectively, you can skew the districts so that your party wastes fewer of its votes, and your opponent wastes more of their votes, and you can create a map where a 50/50 citizenry elects a supermajority of one party.  

So the goal of gerrymandering is to waste as many of your opponents votes, and as few of your own votes, as possible.  Let’s see if we notice any change in wasted votes after the 2010 redistricting.  To keep the chart easier to read, let’s start with just one district map – the NC House of Representatives.  Let’s look at the percent of votes that each party wasted.

There it is — a pretty clear jump between close margins in 2002-2010, to very lopsided vote wasting in 2012 and later.  And lest you think I cherry-picked the worst map, I actually chose the least dramatic.  The NC Senate and US House results are even more lopsided.  

In case you’re not a chart person, here’s what that last one says — since 2012, the Congressional maps have been drawn so that over ⅔ of Democrats are wasting their votes, and less than ⅓ of Republicans.  

If you look at all of the elections 2002-2010, the highest percentage of wasted votes that any party had was 59.6% (Republicans in the Senate in 2006).  In every single election 2012-2016, the Democrats have wasted more than that.  It’s impossible for me to overstate that sharp division.  Every single election since 2012 has been more skewed than the worst outlier of the decade before.  Every single election since 2012 has been less fair than all of the elections of the prior decade.  

Back to my original question – how did we end up with 50/50 elections for governor, but a legislature that’s had 60% or more seats go to one party, the same party, in every election since 2012?  The best answer I have for that is to look at the districting.  I believe that the data show that the districts were drawn to systematically favor one party by wasting more of their opponent’s votes.

 

Certificate Pinning with OkHttp

Now that you know how to run a man-in-the-middle attack, you want to know how to prevent one. The way we’re going to do that is by “pinning” your certificate.

How does it work?

HTTPS begins its connection by verifying the certificate that’s being presented by the server. It checks all of the signatures in the chain of trust, and that the root certificate authority (CA) is any one of the trusted roots installed in your OS. What we do with pinning is to add one more check — the client checks that one of the certificates in the chain of trust is exactly the certificate it’s expecting.

The man-in-the-middle attack that we ran relied on installing a bogus trusted root CA in the operating system, which mitmproxy used to create a bogus chain of trust. But if you’ve pinning the real certificate, this attack fails because mitmproxy can’t issue a bogus certificate with your server’s real key.

OK let’s do it

OkHttp3

    1. Set up a certificate pinner on your OkHttpClient with a dummy signature for your domain:
    2. Run with that certificate pinner, you’ll get an SSLPeerUnverifiedException showing the expected hash (your bogus one) and the actual hashes for each certificate in the chain.certificate-pinning-failure
    3. Replace the bogus hash in your code with the hash of the certificate you want to pin, and run it again.
    4. Now try running with mitmproxy again to verify that it fails.
OkHttp2
  1. Same steps as above, but here’s the code for your dummy pinner:

Let’s talk about some of the things you need to think about:

Renewals – certificates aren’t forever

When you look at any of the certificates in your chain, you’ll notice that they all have expiration dates. Sometime before that expiration date, your system administrator is (hopefully) going to renew that certificate – that is, they’ll get a new certificate for your server. If you’re not planning ahead and talking to your sysadmin, this could break your connections.

Pin the whole certificate or just the public key?

If you were manually verifying the certificate, you might think of hashing the whole certificate and ensuring that matches your expected value. That’s going to be brittle, because if the certificate gets renewed and you’re pinning to the entire certificate, your pin is going to break.

So in the example above, OkHttp is pinning the public key info, not the whole certificate. So if the certificate gets renewed and the new certificate has the same public key, your pin should continue to work fine.

Which certificate in the chain to pin?

You can pin any of the certificates in your chain of trust – the root CA, and intermediate CA, or your server’s certificate. Let’s look at some pros and cons of each.

If you pin one of the CA certificates, then as long as your server gets a new certificate from the same CA, it doesn’t matter if your certificates public key changes. This is potentially useful if your server is hosted by somebody else and you might change providers. The downside is you’re locked into that CA. You have to get your renewed certificate from the same CA, and their public key has to be unchanged.

On the other hand, if you pin your server’s certificate, then you can get your renewed certificate from any CA you want, as long as your server’s public key is unchanged.

Bottom line: talk to your system administrator to ensure that your pinning won’t break when they renew the certificate.

UPDATE — see also:

How, and Why, to run a Man-In-The-Middle Attack on Your Own App

Wait, what? Why would I want to do that?

Lots of good reasons:

  • If you want to see the actual traffic you’re sending over the network, for debugging purposes.
  • See what third-party libraries might be sending, and how they’re sending it.
  • Demonstrating how trivial it is to do so, as a pre-condition for mitigating it.
But I’m using HTTPS, so I can’t MITM my traffic.

id45(side note: any day I can use an Independence Day GIF is a good day)

Yes, by using HTTPS, a random third-party can’t decrypt your payloads.

https-side-by-side-marked

But while HTTPS protects you from a third-party listening in to your traffic, the endpoints are still vulnerable.

alice-bob-carol-marked

I don’t want to get sidetracked with a detailed explanation of how HTTPS protects you, so here’s the short version: First, you verify the server’s identity using the Public Key Infrastructure. The server presents a certificate saying “I am example.com”. That certificate has been signed by a trusted third-party, called a Root Certificate Authority (CA). That signature says “I am Trusted CA, Inc. and that really is example.com”. Your OS has a couple hundred root CA certificates installed, so it can be sure that it’s really Trusted CA, Inc. that signed the certificate. (In reality, the server’s certificate has actually been signed by an intermediary CA, which was in turn signed by the root CA. We call this the chain of trust).

After you’ve established the server’s identity, you exchange public keys, and can encrypt messages to each other that can only be decrypted by the known party – no third party can listen in to your encrypted messages and see what you’re saying.

Since we can’t decrypt your HTTPS payloads, we’re going to attack by making a fake root CA and installing it as one of the device’s trusted roots.

Isn’t that hard?

Nope. We’re going to install a tool that handles it all for you.

Step 1. Install mitmproxy on your dev machine.

Step 2. Run mitmproxy on your dev machine. Down in the bottom-right corner, it’ll tell you what port it’s running on. Also, make note of your dev machine’s IP address.

Step 3. Connect your Android (or iOS or whatever else) device to the same network as your dev machine, and in your network settings, set your proxy to your dev machine’s IP address and the port that mitmproxy is running on. The exact details of how to do this vary by OS version, so’ll have to google that for yourself.

Step 4. On your target device, open a browser and go to (mitm.it) This magic domain will help you install mitmproxy’s certificate as a trusted root CA on your device.

Step 5. Run your app, and watch mitmproxy dump all of your traffic.

mitm-https-cleartext-marked

So what do I do now?

You pin your certificates. I’ll follow up with another blog post soon to help you do that (edit: follow-up post on certificate pinning is up). (Good news: it’s pretty easy)

Also, you learn a little more about HTTPS. Knowledge is power.

Self-Signed Certificates with OkHttp – the Right Way

Your sysadmin comes to you and says “hey, let’s quit using the developmestruction environment, I set up this new test server we can test with instead.” Awesome.  This is definitely a good thing for your development.

So you switch the URL your app is calling, but now you’re getting see this:

Screen Shot 2016-06-05 at 7.02.02 PM

Why can’t you connect?  Well, SSL certs cost time and money, so a lot of times in internal test and development environments you’ll see self-signed certificates.  By default, OkHttp isn’t going to trust those, since they aren’t signed by a known, trusted Certificate Authority (CA).

At this point, you may find some StackOverflow answers suggesting that you make a dummy TrustManager that just blindly accepts any SSL certificate.  Don’t do that.  You may as well disable SSL at that point, because anybody can run a man-in-the-middle attack to read and/or manipulate your traffic.  Seriously, just don’t.  Even for your test environment.  

The good news is, it’s just as easy to fix the right way by adding trust for your self-signed certificate.  Here’s all it takes:

Step 1: Download the .cer file

1-chrome-broken-lock

Open the URL in Chrome.  You’ll see the broken lock icon in the address bar; click it. Your goal is to drag the certificate to somplace you can work with it; Chrome will give you a .cer file.  (screenshots are small, click to embiggen)

1b-chrome-not-private1c-connection-detail

1d-cert

Step 2: convert to .pem, using this in your terminal (and maybe spend a minute with “man openssl” to see what’s up here – we’re converting from one certificate file format to another)

openssl x509 -in server.cer -inform DER -out server.pem -outform PEM

Step 3: Drop the .pem in your app’s assets folder

Step 4a: Here’s how to add that custom cert to OkHttp3

Step 4b: OkHttp2 version

Run it again, and you’re good to go.