Reverse-engineering the fake Haven apps

As happens with successful and interesting apps, somebody made impostor copies of the Guardian Project’s Haven app, which got a lot of press due in part to Edward Snowden’s involvement.  As of this writing, the copycat has a dozen listings on the Play Store, all with little variations in the name.  (Hat tip to @rettiwtkrow, whose tweet was the first I saw of it).

Interestingly, the fake apps have a slightly different icon from the real app.  Below is the real app’s icon on the left, and the fake app on the right.

I downloaded half a dozen of these copycats and started reverse-engineering them.  The copycats are all the same as each other, only the package name changes.  But they’re all completely different from the real app.  I wondered at first if they might have ripped off the real app’s code since it’s open source, but no.

When you open the fake, it’s immediately clear that they’re just using the name to drive downloads, and the functionality of the app isn’t even trying to look real.  It’s a run-of-the-mill crapware “cleaner” app.

It has a tab bar across the bottom, and each tab is a different tool: “Charge Booster”, “Battery Saver”, “CPU Cooler”, and “Junk Cleaner”.    They show some made-up stats about the device, and make promises they can’t deliver.  For the most part, as you run one of these tools, it displays an animation (that isn’t actually tied to any real work being done), and then a full-screen ad.  After one tool, I was presented with a fake Facebook sign-up page.

      

If you try to apply the “Ultra Power Saving Mode” it sends you to the OS settings app to enable “Allow modify system settings”.  I found this alarming, and assumed this is where I would find the nefarious code.  So I decompiled the app and poked around, and was honestly pretty underwhelmed.

I should note that the impostor app’s targetSdk is 24, meaning it has to ask for permissions at runtime, which is nice and a little surprising, given that Google isn’t yet forcing developers to do so.

Anyway, I need not have worried – after you grant this app the ability to modify system settings, all it does is turn down your screen brightness, disable autorotation, and disable background syncing for other apps.

(such battery savings. many mAh. very wow)

Then I got a little worried again when I saw the app has a BroadcastReceiver that runs immediately when the app is installed or updated.  This means the app has code that runs in the background even if you never open the app.  But again, I shouldn’t have worried because all it does is try to show a toast message saying “[app] Is Optimized by Fast Cleaner & Battery Saver.”

The bottom line is that these copycats appear to be using the Haven name and publicity just to drive downloads, then make a quick buck off of advertising.  I couldn’t find any evidence of anything more sinister than that.

For my last point, let’s talk about attribution. I can’t say for sure, but I have an educated guess where this came from.  There’s an app which is almost entirely the same code on the Google Play store.  Unlike the Haven copycats, this app has accurate screenshots on the Play Store, and the UI matches the Haven impostors.  Either this is the same developer, or it’s a false flag.  I can’t rule that out, so I’m not saying this with 100% confidence, but a preponderance of the evidence surely doesn’t look good for that developer.

I assume Google will be along shortly to remove the offending copycats (and hopefully terminate their ad accounts, and the rest of their apps under both publisher listings).  But it’s an important reminder that copycats exist, and it’s important to remain vigilant.

Those Android Versions Dashboards are Overrated

The Android Developers website has a chart showing how many Android devices are on each version of the OS.  You’ve probably seen it before.

Each month that chart is updated, and Android-focused blogs expend a lot of words analyzing the changes. [1 2 3]  Broader tech blogs use it to decry Android’s “fragmentation” problem. [1 2 3 4 5]

But I’m here to let you in on a little secret: that chart doesn’t mean what you think it means, and it’s not all that useful.  That chart represents all Android devices, all over the world, that have connected to the Google Play Store in the last seven days.  

To show why that group of devices isn’t the data we want, let’s compare that to data from one of my personal apps.  And because I’m a nice person, I’m not going to subject you to another pie chart.

For any Android version on the x-axis, the height represents the proportion of devices that are running that version or newer.  These are the numbers you need to make the important decision of “what is the lowest version of Android that I’m going to support?”  For one example, let’s look at Android 7.0.  The Google Play numbers would tell you that only 23.8% of devices are on Android 7.0 or newer, but for my app, that’s 67.8% — a difference of 44%.  That’s a night and day difference.

Why so different?

There are a lot of reasons that it’s different, but I’ll cover a few:

  • All over the world.  The Google Play numbers encompass Android devices all over the world.  When you see numbers broken down by country, there are big differences.  In most cases, your target market is not the whole world.
  • Not just phones. Google’s numbers include phones, tablets, and anything else running Android.  There are a lot of weird things running Android.  You don’t need to support somebody’s fridge.  For a lot of apps, you shouldn’t even care about tablets.
  • Not just primary phones.  I personally have at least eight devices that would show up in Google’s chart (i.e. I’ve powered them on in the last seven days), including an old tablet running Android 4.4 that I keep around for testing.  But if you’re trying to target me as a user, you don’t need to care about the oldest device on my desk, you just need to support my primary phone – which is running Android 8.0.

Takeaway

If you’re trying to figure out what’s the lowest version of Android that you’re going to support for an existing app, make sure you’re looking at data from your app.  I guarantee you it’s going to be different than the public Google Play numbers.  And if you’re starting a new app, try to find a source of data that more closely matches your target market.

Most of the apps on my phone aren’t obfuscated

I think it’s important for any app that deals with sensitive user data to incorporate code obfuscation into their security.  While far from impenetrable, it’s a useful layer in thwarting reverse-engineers from understanding your app and using that knowledge against you.  If you’ve wondered why I seem to be on a code obfuscation kick recently, it’s because I’ve noticed, anecdotally, that a lot of apps I expected to be using obfuscation, weren’t.  So I set out to see if I could do some research and turn that anecdote into data.

I took all of the apps I have on my phone right now, and calculated how well the code was obfuscated (see methodology notes at the end of the post if you’re curious).  Here are the points that jumped out at me:

  • Most of the apps on my phone aren’t obfuscated. 53% of apps fall in the 0-20% obfuscated bucket, which means they probably didn’t have ProGuard turned on at all.
  • Most of the remaining apps are poorly-to-medium obfuscated. The next 35% of apps are 20-60% obfuscated, which means they probably put some effort into obfuscating the code, but weak configurations (like overusing the -keep directive) have kept much of their code un-obfuscated.
  • A small portion of apps are well obfuscated. Just 12% of apps are in the 60+% obfuscated range, where most of their code is very difficult to follow.

Why would an app with ProGuard turned off have a score greater than 0? A small part of that would be due to false positives (see the “Methodology” section, below) but most is due to third-party library code that has its internal implementation details obfuscated, before that code is even packaged into an app.

Methodology

    • Corpus – As alluded to in the post, I ran this analysis on the apps that I have installed on my personal phone.  This group of apps probably isn’t a perfectly representative sample of apps on the app store.  It is, however, a representative sample of apps that I actually care about 🙂
    • Tools – to calculate the “% obfuscated” metric, I used the `apkanalyzer` tool included in the Android SDK.  Looking through the classes, methods, and fields, I counted the number that appear to have been obfuscated (see next point), and added them all up for a ratio.
    • Metric – deciding whether something has been obfuscated isn’t straightforward; different obfuscation methods will yield different results.  But I know ProGuard is a common tool for Android obfuscation, and it prefers to transform names to single-character names, so I checked for single-character names.  This has the potential for false positives (e.g. x, y, z in a graphics data class would register as obfuscated even though those are probably the original names) and false negatives (any other obfuscation method might choose different names which aren’t single-character, which this analysis would miss).

 

ProGuard pro-tip: Don’t use ProGuard’s “-include” with Gradle builds

I like to organize my ProGuard rules by breaking them up into separate files, grouped by what they’re acting on — my data model, third-party libs, etc.

It’s tempting to reference one main ProGuard file from your build.gradle, and then put ‘-include’ entries in the main file to point to the rest, but there’s a problem with this. If you make a change in one of the included files, it won’t be picked up until you do a clean build.

The reason for this is Gradle’s model of inputs and outputs for tasks.  Gradle is really smart about avoiding unnecessary work, so if none of the inputs for a task have changed, then Gradle won’t rerun the task.  If you list one main ProGuard file in your build.gradle and include the rest from there, Gradle only sees the main file as an input.  So if you make changes in an included file, Gradle doesn’t think anything changed.

The easy way I’ve found to work around this is to put all of my ProGuard rules into a directory, and include them all in your build.gradle with this snippet:

Distinguishing between the different ProGuard “-keep” directives

If you search for ProGuard rules for a Java or Android library, you’ll see a lot of answers on StackOverflow that tell you to do something like this:

-keep class com.foo.library.** { *; } 

That advice is really bad, and you should never do it.  First, it’s overly broad — that double-asterisk in the package means every class under every package under that top-level package; and the asterisk inside the curly braces applies to every member (variables, methods, and constants) inside those class.  That is, it applies to all code in the library.  If you use that rule, Jake Wharton is going to come yell at you:

Second, and what this post is about, is the beginning of the directive, that “-keep”.  You almost never want to use -keep; if you do need a ProGuard rule, you usually want one of the more specific variants below.  But it always takes me a minute with the ProGuard manual to figure out which one of those variants applies to my case, so I made some tables for quick visual reference.  (Quick aside: the ProGuard manual is very useful and I highly recommend you look through it.)

No rule

To get our bearings, let’s look at the default.  If you don’t specify a keep directive of any kind, then ProGuard is going to do it’s normal thing — it’s going to both shrink (i.e. remove unused code) and obfuscate (i.e. rename things) both classes and class members.

-keep

See, this is why I said you should almost never use -keep.  -keep disables all of ProGuard’s goodness.  No shrinking, no obfuscation; not for classes, not for members.  In real use cases, you can let ProGuard do at least some of it’s work.  Even if your variables are accessed by reflection, you could remove and rename unused classes, for example.  So let’s look through the more specific -keep variants.

-keepclassmembers

This protects only the members of the class from shrinking and obfuscation.  That is, if a class is unused, it will be removed.  If the class is used, the class will be kept but renamed.  But inside any class that is kept around, all of its members will be there, and they will have their original names.

-keepnames

This allows shrinking for classes and members, but not obfuscation.  That is, any unused code is going to get removed.  But the code that is kept will keep its original names.

-keepclassmembernames

This is the most permissive keep directive; it lets ProGuard do almost all of its work.  Unused classes are removed, the remaining classes are renamed, unused members of those classes are removed, but then the remaining members keep their original names.

-keepclasseswithmembers

This one doesn’t get a table, because it’s the same as -keep.  The difference is that it only applies to classes who have all of the members in the class specification.

-keepclasseswithmembernames

Similarly, this rule is the same as -keepnames.  The difference, again, is that it only applies to classes who have all of the members in the class specification.

Conclusion

You want to let ProGuard do as much work as possible, so pick the directive that has the fewest red X blocks above, while still meeting your need.

Getting Started with Android Things

Android ThingsI’m giving a presentation tonight at the Charlotte Android Developer / GDG meetup about Getting Started with Android Things.

Update: here’s the video!

Here are the slides.

And some relevant links:

An Android runtime permissions corner case

I recently got curious about what happens to the runtime permissions state when the user revokes or grants a permission using the OS Settings app, rather than allowing or denying in the dialog prompt that an app can show to request permission.

For background, there are two parts to the state that an app can query at any time:

The question I got curious about was this: what happens when the user revokes a permission in the settings app?  If they had previously chosen “don’t ask again”, but then later grant the permission in settings, then revoke the permission again, can you prompt for the permission?

I made a simple app to try it out, and came up with this for the possible states:The answer is that if the user revokes a permission in settings, all of that history essentially goes away.  You’re left in the same state as if you had asked for permission and the user had denied it once, but you can ask again.

I realize this isn’t the sort of thing that’s going to happen in the course of normal usage; I don’t expect most users to ever toggle those permissions from the settings app.  But I think it’s worth understanding how it works.

How (often) Does a Bill Become a Law?

[tl;dr — about a quarter of the time]

Every year as state legislative sessions get started, there’s a flurry of scary headlines of the form: “[State] [Party] files bill to do [thing I find scary, bigoted, or irresponsible]”.  It’s tough to know from reading whether this is something you should get concerned about.  

Maybe you should, maybe this is the next big bill that will make your state a national embarrassment, spark economic boycotts, and cause your Governor to lose his re-election campaign.  Or maybe this is a crackpot bill and it’s not going anywhere, so don’t waste your time.  

Good reporting can help to fill this in – is this a senior legislator, are they a known kook, etc.  But there’s another, more general piece of this context that I wanted to help fill in.  My question is, overall: how often does a bill go from being filed (its first official step) to becoming law (its final official step)?

First you need at least a rough understanding of how a legislature works.  You could watch the classic Schoolhouse Rock video:

Or I’ll give you the 60-second version (skip to the next paragraph if you already know how it works): a legislator files a bill, which is a thing that they want to become law.  The person who files is called the “sponsor” or “patron.”  It’s referred to a relevant committee, which is made up of a subset of the legislators in their chamber (the House or Senate, usually).  If a majority of the members of the committee vote for it, then it goes to the full chamber, if it gets a majority vote there then it’s handed over to the other chamber, and the committee and full-chamber process repeats.  After that, the governor signs the bill then it becomes a law.  (Exact details vary by state, but it’s probably pretty close to that, unless you’re in Nebraska.  Find your legislature’s website and they probably have a “how a bill becomes a law” page to explain their version of the process.)

The important thing to know is that bills can (and do) fail at any of those steps in the previous paragraph.  Depending on the rules of the chamber, if the leadership doesn’t like the sponsor, or the bill, it may never even come up for a vote.  Or those votes will happen in a closed session of a subcommittee, where the public doesn’t get to see which legislators voted to kill a bill.

Back to the point, I downloaded the history of every bill filed in the North Carolina General Assembly (NCGA), my state legislature, since 1985 and parsed out its fate.  The answer to my question was 23.6% — just under a quarter of our bills actually become law.  

So when you see that scary headline, keep in mind that a bill being filed is actually more likely to fail without becoming law, than to succeed.  I’m not telling you to ignore it, or not to fight it — on the contrary, this should encourage you to fight.  Be part of the reason that the bill you don’t like failed.  Get out there and kill that bill.

I’m sorry, I had to.  I don’t even like this movie, but the pun was just sitting right there.

Since I’ve got all this data, I decided to answer another question quickly: how often does the governor veto bills?  This next chart shows how often, when given the option to sign or veto a bill, the governor chose to veto:

So, uh, not often.  There’s a tiny little spike in the 2011-2012 session, when we had a Democratic governor and a Republican-controlled legislature, but even then it barely registers on the chart.

If you got this far and you think this was at least a little neat, I want to do something for you.  I’ve got all this data downloaded and parsed, and I’m sifting through it, but I want to know what questions I should try to answer.  Here are a few that I want to tackle next:

  • a breakdown of where bills fail – in committee, in the full chamber, in the other chamber.
  • which legislators get more of their bills passed?
  • are bills with more co-sponsors more likely to pass?

Can you come up with others?  If you have a question or a hypothesis, tweet it at me @jebstuart and I’ll see if I can answer it.

Following are a few notes about methodology & assumptions:

Why start in 1985? That’s as far back as the easily-available online records go (and almost my full lifetime) so I’m calling that “the relevant dataset”.  I feel like it’s enough data to get a good sense of the overall pattern.  

Why are there only data points every other year?  The NCGA operates in two-year sessions (e.g. we’re currently in the 2017-2018 session), and that’s how their bills are filed.

What about resolutions?  An NC legislator can file a “bill” or a “resolution”, but resolutions are typically honorary things without legal weight, so I excluded them from the dataset as not relevant to the question I wanted to answer.

What about “extra” (or “special”) sessions?  I didn’t include them either.  They’re typically single-purpose sessions with few bills filed, and I didn’t want to clutter the data.

Are the numbers the same for [other state]? Not the exact same, for sure.  Maybe similar, maybe not.  I have a hunch these results are probably “typical”, within a standard deviation or so, but I can’t verify that without a lot of work that I’m not going to do.

Gerrymandering in NC, or, A Tale of Two States

I moved to North Carolina in 2011 and, as I’ve come to learn more about our politics, I’ve been struck by something that seems impossible: North Carolina has two separate realms of politics.  In statewide elections, NC is split pretty much 50/50 between the two major parties.  But in races where the state is split into districts, like the state legislature, the Republican Party controls a supermajority of seats (over 60%).  With over 60% of the seats in the NC House and Senate, the Republicans can override the Governor’s veto, severely limiting the executive’s power.  

In the first chart are the statewide races – some are won by Democrats, some by Republicans, but they’re all within a couple percent of 50/50.  In the second chart, we see the various districted houses in North Carolina, and all of them have 60+% Republican wins.  I’ve spent much of the last year puzzling over how these two types of elections could produce such different results from the same voters.  My background is in computer science and math, so I’ve tried to find an answer to this question the best way I can: using data.

Our state legisIature, the North Carolina General Assembly (NGCA), is elected every two years, as are our Congressional representatives.  The US President and our statewide offices (Governor, Lieutenant Governor, Attorney General, etc) are every four years, and US Senators are every six years.  So we have a lot of data to work with.  I started by downloading the raw election result data for each of the previously-mentioned races for the last fifteen years from the State Board of Elections.  (Before 2002, the data gets harder to access and less detailed).

What I found is that, before I moved here, elections weren’t quite so lopsided.
Clearly something changed, abruptly, between 2010 and 2012.  I don’t think it was me moving to Charlotte.  More importantly, it wasn’t that the electorate faced a massive shift between parties — if you look at statewide elections, the party splits stayed roughly the same.  There’s some variability as you look at individual races, but you see they’re mostly within the 45-55% range.

So what changed?

These changed.  Following the 2010 census, as happens after each census, district boundaries are redrawn in order to rebalance the shifting population across the districts.  But in 2010, the Republican Party had just won control of the NCGA, so they got to control the redistricting process – the prior maps were drawn by a Democratic Party controlled legislature.

Now it’s important to understand how strategic redistricting works.  If you want to skew the district map to your party’s advantage, you create a small number of districts with as many of your opponent’s voters as you can cram in (“packing”), and a larger number of districts with just enough of your voters to reliably win (“cracking”).  If you’re unfamiliar, I’d recommend taking a minute to read Wikipedia’s primer, which has some helpful examples of why this is so effective.  Also, fun fact, the first district they show as an example of egregious gerrymandering is the district I was in when I first moved to Charlotte, the NC-12th.

In any election, 49% of the votes are “wasted” — that is, they don’t contribute to the election of a candidate.  This includes all of the votes for the losing candidate, and all of the votes for the winner past a simple majority.  If you pack and crack effectively, you can skew the districts so that your party wastes fewer of its votes, and your opponent wastes more of their votes, and you can create a map where a 50/50 citizenry elects a supermajority of one party.  

So the goal of gerrymandering is to waste as many of your opponents votes, and as few of your own votes, as possible.  Let’s see if we notice any change in wasted votes after the 2010 redistricting.  To keep the chart easier to read, let’s start with just one district map – the NC House of Representatives.  Let’s look at the percent of votes that each party wasted.

There it is — a pretty clear jump between close margins in 2002-2010, to very lopsided vote wasting in 2012 and later.  And lest you think I cherry-picked the worst map, I actually chose the least dramatic.  The NC Senate and US House results are even more lopsided.  

In case you’re not a chart person, here’s what that last one says — since 2012, the Congressional maps have been drawn so that over ⅔ of Democrats are wasting their votes, and less than ⅓ of Republicans.  

If you look at all of the elections 2002-2010, the highest percentage of wasted votes that any party had was 59.6% (Republicans in the Senate in 2006).  In every single election 2012-2016, the Democrats have wasted more than that.  It’s impossible for me to overstate that sharp division.  Every single election since 2012 has been more skewed than the worst outlier of the decade before.  Every single election since 2012 has been less fair than all of the elections of the prior decade.  

Back to my original question – how did we end up with 50/50 elections for governor, but a legislature that’s had 60% or more seats go to one party, the same party, in every election since 2012?  The best answer I have for that is to look at the districting.  I believe that the data show that the districts were drawn to systematically favor one party by wasting more of their opponent’s votes.

 

Certificate Pinning with OkHttp

Now that you know how to run a man-in-the-middle attack, you want to know how to prevent one. The way we’re going to do that is by “pinning” your certificate.

How does it work?

HTTPS begins its connection by verifying the certificate that’s being presented by the server. It checks all of the signatures in the chain of trust, and that the root certificate authority (CA) is any one of the trusted roots installed in your OS. What we do with pinning is to add one more check — the client checks that one of the certificates in the chain of trust is exactly the certificate it’s expecting.

The man-in-the-middle attack that we ran relied on installing a bogus trusted root CA in the operating system, which mitmproxy used to create a bogus chain of trust. But if you’ve pinning the real certificate, this attack fails because mitmproxy can’t issue a bogus certificate with your server’s real key.

OK let’s do it

OkHttp3

    1. Set up a certificate pinner on your OkHttpClient with a dummy signature for your domain:
    2. Run with that certificate pinner, you’ll get an SSLPeerUnverifiedException showing the expected hash (your bogus one) and the actual hashes for each certificate in the chain.certificate-pinning-failure
    3. Replace the bogus hash in your code with the hash of the certificate you want to pin, and run it again.
    4. Now try running with mitmproxy again to verify that it fails.
OkHttp2
  1. Same steps as above, but here’s the code for your dummy pinner:

Let’s talk about some of the things you need to think about:

Renewals – certificates aren’t forever

When you look at any of the certificates in your chain, you’ll notice that they all have expiration dates. Sometime before that expiration date, your system administrator is (hopefully) going to renew that certificate – that is, they’ll get a new certificate for your server. If you’re not planning ahead and talking to your sysadmin, this could break your connections.

Pin the whole certificate or just the public key?

If you were manually verifying the certificate, you might think of hashing the whole certificate and ensuring that matches your expected value. That’s going to be brittle, because if the certificate gets renewed and you’re pinning to the entire certificate, your pin is going to break.

So in the example above, OkHttp is pinning the public key info, not the whole certificate. So if the certificate gets renewed and the new certificate has the same public key, your pin should continue to work fine.

Which certificate in the chain to pin?

You can pin any of the certificates in your chain of trust – the root CA, and intermediate CA, or your server’s certificate. Let’s look at some pros and cons of each.

If you pin one of the CA certificates, then as long as your server gets a new certificate from the same CA, it doesn’t matter if your certificates public key changes. This is potentially useful if your server is hosted by somebody else and you might change providers. The downside is you’re locked into that CA. You have to get your renewed certificate from the same CA, and their public key has to be unchanged.

On the other hand, if you pin your server’s certificate, then you can get your renewed certificate from any CA you want, as long as your server’s public key is unchanged.

Bottom line: talk to your system administrator to ensure that your pinning won’t break when they renew the certificate.

UPDATE — see also: