2017-11-04

The Salsa20 Cipher in PowerShell, For Some Reason

Sometimes, against my better judgment, I find myself working in ultra-restrictive secure environments, the computer equivalent of a clean room, or like that room in Mission Impossible where Tom Cruise had to hang from wires because everything around him was being monitored by a security force.

Modern secure-access tools aren't a bad thing, per se. They keep China at bay. But sometimes they're just a little too secure.

Long story short, sometimes in trying to keep the bad guys from doing bad things, secure workstations keep the good guys from doing good things, too. For example, sometimes they keep you from using builtin cryptography.

That is quite possibly the dumbest thing I have ever witnessed in my life, and I once legitimately asked my social studies teacher what they called World War I before there was a World War II[0].

I ended up writing one of the many AES variants in PowerShell. Not because someone asked me to do so, but because goddammit, no one is going to tell me when I can and cannot move secrets between machines. The secure environment restricted most of my access to OS-native cryptography primitives and even to character-encoding classes. This is kind of like asking a person to swim with one or both hands tied behind their back. You can do it, but don't expect the person to succeed.

AES isn't a bad cipher. It's not perfect. I've disliked S-boxes ever since I found out Feistel grew up in Germany during that whole "starting all the wars" era. It's been heavily audited, there's no practical attack on its pure design and its strongest versions are known to be vulnerable only to weaknesses in implementation. (I may oversimplify here. AES is still good enough for your bank to use, whatever that may be worth.) There's nothing inherently wrong with AES yet. My PowerShell-only implementation had no optimizations and is pitifully slow[1]. Like, super duper slow. How slow? Four bytes per second. Not kilobytes. Bytes.

So I went back to my original idea, which was to try a modern cipher that could be run from PowerShell and give "good enough" security to data in transit or kept on disk for short periods of time. Things like temporary passwords and OAuth tokens. Not things like banking account information or Social Security numbers. China already has those.

I spent a little time this weekend re-reading and implementing DJB's Salsa20 cipher in pure PowerShell. It took about a night. Let's get to it, shall we?

The Salsa20 cipher is a really clever design. Given a key and a nonce, it creates a deterministic set of up to 2^70 64-byte outputs that can be XOR-ed with plaintext in order to create an invertible ciphertext. Given the same key and series of nonces, the ciphertext can be decrypted with the same set of steps.

Salsa20 defines a "rotation" operation that has been around since DJB's old SURF days. It shifts the bits of a 32-bit integer around. In C it looks like this:

#define ROTATE(x,b) (((x) << (b)) | ((x) >> (32 - (b))))

In PowerShell, do this:

Function Salsa20-Rotate {
  Param(
    [UInt32]                    $u,
    [Byte][ValidateRange(0,32)] $c
  )
  [UInt32] $m = $u -shl $c
  [UInt32] $n = $u -shr (32 - $c)
  Return [UInt32] ($m -bor $n)
}

The Salsa20 paper also defines a "littleendian" function used to convert 4-byte arrays into 32-bit integers. The array [1,2,3,4] would be the integer 0x04030201 for example. In practice, you need two functions, one to convert an array into an integer, and another to convert the integer back into an array:

Function Salsa20-LoadLittleEndian {
  Param([Byte[]] $x)
  [UInt32] $out = $x[3]; $out *= 256
  $out += $x[2]; $out *= 256
  $out += $x[1]; $out *= 256
  $out += $x[0]
  Return $out
}

Function Salsa20-StoreLittleEndian {
  Param([UInt32] $u)
  $x    = New-Object Byte[] 4
  $x[0] = $u -band 255; $u = $u -shr 8
  $x[1] = $u -band 255; $u = $u -shr 8
  $x[2] = $u -band 255; $u = $u -shr 8
  $x[3] = $u
  Return $x
}

PowerShell PowerShell also lacks a decent integer arithmetic feature, as far as I know. If you add a UInt32 value to a UInt32 value, it will try to keep any overflow from occurring. Sometimes more is less, so when I add 1 to the maximum possible value an unsigned 32-bit integer can have, I want the value to be zero. To address this, create a "sum" function that will add two UInt32 values and return a UInt32 value:

Function Salsa20-Sum {
  Param(
    [UInt64] $a,
    [UInt64] $b
  )
  Return [UInt32]([UInt32]::MaxValue -band ($a + $b))
}

That's all you really need for Salsa20 primitives. I dabbled with writing the quarterround, rowround, and columnround functions as the Salsa20 paper defines them, but the NaCl library doesn't bother with any of that for its Salsa20 reference implementation. So I just used that:

[Byte[]] $c = 101,120,112,97,110,100,32,51,50,45,98,121,116,101,32,107 # "expand 32-byte k"

Function Salsa20-Core {
  Param(
    [Parameter(Mandatory=$True,Position=0)]
    [Byte[]] $k,
    [Parameter(Mandatory=$True,Position=1)]
    [Byte[]] $in,
    [Parameter(Mandatory=$False,Position=2)]
    [Byte] $Rounds = 20
  )

  [UInt32] $j0  = $x0  = Salsa20-LoadLittleEndian  $c[ 0..3]
  [UInt32] $j1  = $x1  = Salsa20-LoadLittleEndian  $k[ 0..3]
  [UInt32] $j2  = $x2  = Salsa20-LoadLittleEndian  $k[ 4..7]
  [UInt32] $j3  = $x3  = Salsa20-LoadLittleEndian  $k[ 8..11]
  [UInt32] $j4  = $x4  = Salsa20-LoadLittleEndian  $k[12..15]
  [UInt32] $j5  = $x5  = Salsa20-LoadLittleEndian  $c[ 4..7]
  [UInt32] $j6  = $x6  = Salsa20-LoadLittleEndian $in[ 0..3]
  [UInt32] $j7  = $x7  = Salsa20-LoadLittleEndian $in[ 4..7]
  [UInt32] $j8  = $x8  = Salsa20-LoadLittleEndian $in[ 8..11]
  [UInt32] $j9  = $x9  = Salsa20-LoadLittleEndian $in[12..15]
  [UInt32] $j10 = $x10 = Salsa20-LoadLittleEndian  $c[ 8..11]
  [UInt32] $j11 = $x11 = Salsa20-LoadLittleEndian  $k[16..19]
  [UInt32] $j12 = $x12 = Salsa20-LoadLittleEndian  $k[20..23]
  [UInt32] $j13 = $x13 = Salsa20-LoadLittleEndian  $k[24..27]
  [UInt32] $j14 = $x14 = Salsa20-LoadLittleEndian  $k[28..31]
  [UInt32] $j15 = $x15 = Salsa20-LoadLittleEndian  $c[12..15]

  for ($i = $Rounds; $i -gt 0; $i-=2) {
    $x4  = $x4  -bxor (Salsa20-Rotate (Salsa20-Sum $x0  $x12) 7)
    $x8  = $x8  -bxor (Salsa20-Rotate (Salsa20-Sum $x4  $x0)  9)
    $x12 = $x12 -bxor (Salsa20-Rotate (Salsa20-Sum $x8  $x4)  13)
    $x0  = $x0  -bxor (Salsa20-Rotate (Salsa20-Sum $x12 $x8)  18)
    $x9  = $x9  -bxor (Salsa20-Rotate (Salsa20-Sum $x5  $x1)  7)
    $x13 = $x13 -bxor (Salsa20-Rotate (Salsa20-Sum $x9  $x5)  9)
    $x1  = $x1  -bxor (Salsa20-Rotate (Salsa20-Sum $x13 $x9)  13)
    $x5  = $x5  -bxor (Salsa20-Rotate (Salsa20-Sum $x1  $x13) 18)
    $x14 = $x14 -bxor (Salsa20-Rotate (Salsa20-Sum $x10 $x6)  7)
    $x2  = $x2  -bxor (Salsa20-Rotate (Salsa20-Sum $x14 $x10) 9)
    $x6  = $x6  -bxor (Salsa20-Rotate (Salsa20-Sum $x2  $x14) 13)
    $x10 = $x10 -bxor (Salsa20-Rotate (Salsa20-Sum $x6  $x2)  18)
    $x3  = $x3  -bxor (Salsa20-Rotate (Salsa20-Sum $x15 $x11) 7)
    $x7  = $x7  -bxor (Salsa20-Rotate (Salsa20-Sum $x3  $x15) 9)
    $x11 = $x11 -bxor (Salsa20-Rotate (Salsa20-Sum $x7  $x3)  13)
    $x15 = $x15 -bxor (Salsa20-Rotate (Salsa20-Sum $x11 $x7)  18)
    $x1  = $x1  -bxor (Salsa20-Rotate (Salsa20-Sum $x0  $x3)  7)
    $x2  = $x2  -bxor (Salsa20-Rotate (Salsa20-Sum $x1  $x0)  9)
    $x3  = $x3  -bxor (Salsa20-Rotate (Salsa20-Sum $x2  $x1)  13)
    $x0  = $x0  -bxor (Salsa20-Rotate (Salsa20-Sum $x3  $x2)  18)
    $x6  = $x6  -bxor (Salsa20-Rotate (Salsa20-Sum $x5  $x4)  7)
    $x7  = $x7  -bxor (Salsa20-Rotate (Salsa20-Sum $x6  $x5)  9)
    $x4  = $x4  -bxor (Salsa20-Rotate (Salsa20-Sum $x7  $x6)  13)
    $x5  = $x5  -bxor (Salsa20-Rotate (Salsa20-Sum $x4  $x7)  18)
    $x11 = $x11 -bxor (Salsa20-Rotate (Salsa20-Sum $x10 $x9)  7)
    $x8  = $x8  -bxor (Salsa20-Rotate (Salsa20-Sum $x11 $x10) 9)
    $x9  = $x9  -bxor (Salsa20-Rotate (Salsa20-Sum $x8  $x11) 13)
    $x10 = $x10 -bxor (Salsa20-Rotate (Salsa20-Sum $x9  $x8)  18)
    $x12 = $x12 -bxor (Salsa20-Rotate (Salsa20-Sum $x15 $x14) 7)
    $x13 = $x13 -bxor (Salsa20-Rotate (Salsa20-Sum $x12 $x15) 9)
    $x14 = $x14 -bxor (Salsa20-Rotate (Salsa20-Sum $x13 $x12) 13)
    $x15 = $x15 -bxor (Salsa20-Rotate (Salsa20-Sum $x14 $x13) 18)
  }

  $x0  = Salsa20-Sum $x0 $j0
  $x1  = Salsa20-Sum $x1 $j1
  $x2  = Salsa20-Sum $x2 $j2
  $x3  = Salsa20-Sum $x3 $j3
  $x4  = Salsa20-Sum $x4 $j4
  $x5  = Salsa20-Sum $x5 $j5
  $x6  = Salsa20-Sum $x6 $j6
  $x7  = Salsa20-Sum $x7 $j7
  $x8  = Salsa20-Sum $x8 $j8
  $x9  = Salsa20-Sum $x9 $j9
  $x10 = Salsa20-Sum $x10 $j10
  $x11 = Salsa20-Sum $x11 $j11
  $x12 = Salsa20-Sum $x12 $j12
  $x13 = Salsa20-Sum $x13 $j13
  $x14 = Salsa20-Sum $x14 $j14
  $x15 = Salsa20-Sum $x15 $j15

  $out = @()
  $out += Salsa20-StoreLittleEndian $x0
  $out += Salsa20-StoreLittleEndian $x1
  $out += Salsa20-StoreLittleEndian $x2
  $out += Salsa20-StoreLittleEndian $x3
  $out += Salsa20-StoreLittleEndian $x4
  $out += Salsa20-StoreLittleEndian $x5
  $out += Salsa20-StoreLittleEndian $x6
  $out += Salsa20-StoreLittleEndian $x7
  $out += Salsa20-StoreLittleEndian $x8
  $out += Salsa20-StoreLittleEndian $x9
  $out += Salsa20-StoreLittleEndian $x10
  $out += Salsa20-StoreLittleEndian $x11
  $out += Salsa20-StoreLittleEndian $x12
  $out += Salsa20-StoreLittleEndian $x13
  $out += Salsa20-StoreLittleEndian $x14
  $out += Salsa20-StoreLittleEndian $x15

  Return $out
}

As I said earlier, the key to using Salsa20 as an encryption cipher is to XOR the message with the given output of a Salsa20 block: the input to the core function isn't the message, it's just a key and a nonce. To take a series of bytes and encrypt it with a provided encryption key, you could do something like this:

Function Salsa20-EncryptBlock {
  Param(
    [Parameter(Mandatory=$True,Position=0)]
    [ValidateNotNullOrEmpty()]
    [Byte[]] $Key,

    [Parameter(Mandatory=$True,Position=1)]
    [ValidateNotNullOrEmpty()]
    [Byte[]] $Message
  )

  $bytes = $Message.Length
  $n     = New-Object Byte[] 16
  $out   = @()

  while (0 -lt $bytes) {
    $output = Salsa20-Core $Key $n
    [UInt32] $nonce = Salsa20-LoadLittleEndian $n[8..11]
    for ($i = 0; $i -lt [Math]::Min($bytes, $BlockSize); $i++) {
      $out += [Byte]($Message[$nonce * $BlockSize + $i] -bxor $output[$i])
    }

    $nonce = Salsa20-Sum 1 $nonce
    $new_n = Salsa20-StoreLittleEndian $nonce
    for ($i = 0; $i -lt 4; $i++) { $n[8 + $i] = $new_n[$i] }

    $bytes -= $BlockSize
  }
  Return $out 
}

Be aware this method as shown has a much lower limit on the number of nonces that can be used (2^32) than the real cipher can allow (max 2^70). Implementing higher nonce support is left as an exercise for the reader. Other improvements are possible and I may share them later, but this is just a basic framework to get you started. (Not-so-subtle hint: nonces don't always have to start at zero. What does $n[0..7] do, anyway? Caveat: PowerShell doesn't do "pointers", exactly.) Given the above Salsa20 PowerShell definitions, you can encrypt just about any series of bytes you'd care to encrypt, and more quickly than AES, too:

$utf8       = New-Object System.Text.UTF8Encoding
$rnd        = New-Object System.Random
$key        = New-Object Byte[] 32
$plaintext  = $utf8.GetBytes("Hello world!")
$rnd.Nextbytes($key)

Write-Host ("key={0}" -f (($key|%{$_.ToString('x2')}) -join ' '))
Write-Host ("plaintext={0}" -f (($plaintext|%{$_.ToString('x2')}) -join ' '))

[Byte[]] $ciphertxt = Salsa20-EncryptBlock -Key $key -Message $plaintxt
Write-Host ("ciphertxt={0}" -f (($ciphertxt|%{$_.ToString('x2')}) -join ' '))

[Byte[]] $decrypted = Salsa20-EncryptBlock -Key $key -Message $ciphertxt
Write-Host ("decrypted={0}" -f (($decrypted|%{$_.ToString('x2')}) -join ' '))

And the output looks, for example, like so:

key=f8 34 5f a7 4e 93 cf 90 5a 3f 8d fc ad 5e 93 d9 d2 c2 f9 69 f7 42 b0 5d a6 f0 69 c8 ed 4c aa 82
plaintext=48 65 6c 6c 6f 20 77 6f 72 6c 64 21 # "Hello world!" in hex
ciphertxt=31 24 a4 d6 46 cb ad 03 92 ec e7 f8 # encrypted gibberish
decrypted=48 65 6c 6c 6f 20 77 6f 72 6c 64 21 # "Hello world!", decrypted

How much faster is Salsa20 than AES? Well if my "works, just barely" AES-in-PowerShell code could do 4 bytes per second, this Salsa20 implementation does, by my "whatever I have laying around" measurements, about 461 bytes per second. That's still awful, but we're sticking to only-a-PowerShell-script here, and it's still orders of magnitude faster than the other PowerShell script. I call it a win.

It may not be RSA in 3 lines of Perl, but let's face it, sometimes they won't let you use Perl. Philistines. Wheresoever possible, use a serious implementation of a cipher you trust. But for anywhere that encryption is forbidden, make your own in the bathtub of the world.

I welcome cryptanalysis on this implementation. Though you should NOT consider this Salsa20 implementation to be serious cryptography, I've found that it meets the published results of the examples in the Salsa20 paper. Crypto is easy. Good crypto is hard. If there is an error in this code, obvious or subtle, I would like to know about it.

[0] They called it "The Great War", which it probably was, comparatively speaking. Adolf Hitler earned a medal during it and everything.

[1] Most CPU manufacturers have dedicated AES support in their hardware. But in a secure environment that won't even let you create System.Text.ASCIIEncoding objects, which is really just a mapping that says "A = 65, B = 66..." do you really think they'll let you use the hardware crypto APIs?

No comments: