Kategorie: general

general category for the misfits

  • Gedankenspiel: Back to The DB™?

    How did we get here?

    If you’ve been around for a while or worked in a start-up that didn’t think it had to scale to the bazillions from the get-go you’ve probably started out with one database and one monolith. Things were smooth, things were fast. You would care about squeezing the last drop of juice out of that SQL query with some late row binding black magic just because you felt it was your biggest pain at the moment. In which module does this functionality go? What service do I need to call? All RPC just without the Aarrrrrr. If things had grown you might have played around with an app-internal event bus or observers and fought lateral calls, layers, and onions.

    We then had to evolve and no longer thought about in what module to put behaviour but in what service. Which domain does this belong to? Do we need to call it alongside anything else like 99% of the time? If you were unlucky you scaled out to several services but without distributing ownership of the data. Your services would wildly r/w „to and fro“ one DB and any table, perhaps all with the same DB user just because, and soon you’d find yourself in a schema gridlock where it was an insane effort to change anything in your DB schema because you didn’t even know who was using it, let alone what all this even meant.
    Perhaps you were lucky and one kind of data had exactly one owner.

    Now you had these twenty CRUD services devoid of any real meaning. Cheap to write, yet somehow still full of bugs and hard to change. One thing happening in your business would cause eleven calls, a cascade of circuit breakers, reasonable timeouts, and retry strategies you had in place, right? Right?

    Alright, this world is kinda boring to live in and despite its simplicity somehow hard to comprehend. We can do better. What if we cared about business processes again? What if we spoke the same language as the business? What if things … happened and we reacted to them? Remember the event bus we fiddled around with earlier? Let’s make the difficulties of complex adaptive distributed systems explicit. Let’s do EDA with ECST.

    Now this is a nice place to live in. Yeah, we have to care about order and eventual consistency and choreographies and idempotency and what goes into this event actually? But the things happening in our system are the things happening in the business. Ain’t that neat?
    Yet still, you have all these more or less moving parts over and over again. If you fully rely on ECST and your service needs nine different aggregates that’s (hopefully) nine streams or topics to listen to, nine projections, nine evolving schemas to comply with, perhaps nine inboxes (and outboxes on the other side), DLQs, alerting, probably revisions to track, and perhaps dangling references you need to wait for to be resolvable.

    Square one?

    So, six paragraphs into this piece – what if we went back to square one? It’s 2024 and we have globally distributed, highly scalable databases. What if we got back to essentially having just The Database™? I am not talking about an event store (which, too, is basically the database). Each table would have its owner (writer) and could be a shard or instance. Every reader could read from some replica. Your service would still have its own storage, invisible to others. But alongside it, probably in a system-level namespace you’d also have an integration table. Its schema is the contract you adhere to. If you need to change it you can create a new v2 table, write to both and drop v1 after a while. If you needed to have all the history you’d be free to store it – just add a revision column to your PK.

    What would we gain?

    No need for outboxes, inboxes, buses, async schemas, queues, topics, and the like. Writes are also guaranteed in-order (so are reads). No dangling references. Instead: actual referential integrity. No translation to a different technology and back again. Your projections could be „actual“ projections – just views on the tables you’re interested in. No replays. We would pretty much not have to bother with the async nature of a distributed system. Sure, there’s replication lag – but if you require the exact version of something you have to referentiate that anyway and it’s rather rarely the case you do. Real DBs are as fast as it gets (not that messaging was slow, even if done via polling and HTTP).

    Simple use cases could stay very simple. You read from and write to some tables. Perhaps we’d even realize the translation from and into our private tables as triggers so that we really don’t need to care about the integration. It’ll happen behind the scenes. We’d be living in single-DB monolithic wonderland again.

    What would we lose?

    Kinda one of the biggest benefits – the actual events. The what relevant thing has happened in the business? We would have to emulate this. If we kept history we could have an event column right next to the revision (or whatever clock). We would then have to poll and scan from our current cursor in time – just as with streams. Or we write a separate feed / journal.
    What about push-based delivery? We could probably build something with CDC or triggers and web hooks, basically.

    What about commands? Or anything else we’d send as a message? We could use the same thing. Add a command inbox. And if we need a response? The issuer could have a public inbox itself we write to.

    What else?

    We’d force everyone to use the DB technology the ether provides. Every service could still use whatever storage fits its use case best, but for most services it would probably make sense to just use the very same thing. This could both be an advantage and a danger.

    Each and every integration-relevant data we’d have at least twice. But that’s no difference to actual EDAs.

    RDBMs would provide us with fine-grained access control – something a lot of EDAs lack.

    If you don’t need a projection a delete is an actual delete (whether it’s soft or not). Great when you face GDPR. We think of The Log as a good thing and it is, but sometimes you just have to delete stuff. At the same time schema migrations would be rather easy. Create your v2 table as a copy of v1 and make your changes. Or run an explicit migration if it’s more complex. Here, you can rewrite history (of course you need to see how this affects your feed or history if you have one).

    Without any queues in places there’s no buffer. But we don’t actually need it. As in stream-based EDAs where a consumer would just continue reading The Log from its own cursor. So consumers can catch up and don’t get flooded.

    The DB would be the central SPOF and bottleneck – just like the bus or your Kafka or what have you.

    Conclusion

    In my head this plays out quite nicely. You would probably poll a couple of tables that are relevant to you or their „feeds“. Or perhaps there’s just one feed and you just SELECT the ones you need. Perhaps your service has its own instance, replicating from the other instances‘ WALs (The Log ;)). So you would turn the messaging into some sort of federated WAL-based replication. Or you scale your DBs according to your overall load. But then a heavy writer would impact all other services. So we’d probably want to write to our instance and have others read from the WAL. Having just one global feed would then, sadly, be difficult again at certain volumes.

    How could we get there?

    To me, without having invested too many thoughts into this, the most reasonable thing seems to be having one instance per service. It’s the authority for your tables and replicates from the WALs of the stuff you’re interested in. Of course, you’d want more than just one instance as a fail-over for outages or maintenance. Perhaps even a global cluster. It’d be neat if you could just have some extra instances in your global pool that could dynamically be claimed for such scenarios.

    You’d probably want to be able to GRANT READ on some_table* so you don’t have to change permissions each time you push a new schema version.

    In conclusion, someone would probably have to sit down and tweak postgres to do those things :)

  • Loops for Beginners

    This is a shallow tutorial about loops for programming beginners (as my wife). It should get you kick-started but please don’t stop here. There’s a lot more to explore.
    I chose JavaScript since you can run it right here in your browser’s dev tools (CTRL+SHIFT+i in most browsers, then choose console).

    There might be some things in the code examples that don’t make sense to you yet. Just note them down and don’t fall into the rabbit just yet. I provide some extra hints and rabbit holes. Try to sweep over this once without following the white rabbit to focus on the topic.

    Prerequisites:

    • you want to learn this stuff – at least at a practical level
    • you’ve seen JavaScript (JS) before
    • you’ve read the term iteration before
    • you have at least an idea what some of these mean: collection, set, list, vector, array, iterable
    • you know basic data types such as Booleans

    Aight, let’s start with a problem. Let’s say you program a little game for a set of players. There’s Tom, Anne and Jane. You want to greet them before explaining the rules. That’s easy:

    console.log('Hi Tom, Anne and Jane.')

    Fair enough. Now think you have twenty players. Things start to get a little tedious but that’s still managable. Alright.
    Now think you want to open this up to an unknown set of players. At the time you write your program you don’t know how many players there will be nor their names. Now how do you solve this problem?

    Take a moment to figure something out.

    .
    .
    .

    Right, how could the answer not have been „loops“?

    There’s pretty much two and a half types of loops in almost all languages and they come in some differen flavours.

    A loop consists of some condition or control structure and a body. The body gets executed over and over again as long as the condition holds. In C-style languages the basic form is some keyword(s) (condition in parantheses) {body in curly braces}.

    First there is the while loop. It keeps executing it’s body as long as a given expression stays true (and you do not break out of it).

    As long as there’s dirty dishes:
    ____do the dishes

    or:

    While there are dirty dishes:
    ____wash
    ____rinse

    Let’s see how it can solve our problem:

    alert("Who's playing? Type in player names one after another. Type 'done' when you're done.")
    let players = []
    let isDone = false
    
    while (!isDone) {
        let player = prompt('Who are you?')
        if (player === 'done')
            isDone = true
        else
            players.push(player)
    }
    alert('Hello ' + players.join(', '))

    What happens here? As long as we are not done (isDone is false), we will keep asking for player names. If the user types in done we will set isDone to false and the next time the loop’s condition is checked it will evaluate to false and the program flow will continue past the loop’s body. In JS and many other languages the form is „while (condition) { stuff to do as long as condition holds }„.

    Hint: For one-lines you can omit the curly braces.
    Hint: There’s often also do-while of the form „do { stuff } while (condition)“ or do-until where the first iteration of the loop is executed before the condition is checked. Can be handy but you won’t see it often.
    Rabbit Hole: In sane languages, code outside the loop’s body has no access to things declared inside of it.

    Great! Now try to output the collected player names with a while loop.

    Let’s continue with the classic for loop. It is the same as the while loop but with some handy extras. It also keeps executing it’s body as long as a condition holds. But it allows you to declare and initialize a variable and to provide a statement to be executed after each iteration. The common form is „for (initializer; condition; post-loop-statement) { stuff to be repeated }“.

    A speed date:
    For: pour in one glass of wine; not more than ten in total; refill after each round:
    ____have a conversation
    ____drink wine
    ____move to next table


    Let’s see it in action:

    players = []
    let numPlayers = prompt('How many players are there?')
    alert("Who's playing? Type in player names one after another.")
    
    for (let i = 0; i < numPlayers; i++) {
        players.push(prompt('Who are you?'))
    }
    alert('Hello ' + players.join(', '))

    Rabbit Hole: See JS‘ variable declaration to understand what let does here.

    What’s going on here? We first ask for the number of players. Then, we execute the loop’s body this many times. We declare a variable i and initialize it to zero. This is only done once, before the loop’s body executes. The next thing that happens is that the loop’s condition gets checked. If you entered five, for example, 0 < 5 holds and the body gets executed. After that, the post-iteration statement gets executed (i++ in this case is the same as i = i + 1) and i will be incremented by one. It is now 1 (0 + 1). The condition still holds so we go for another round and so on until 5 < 5 evaluates to false. What values does i take one after another?

    0, 1, 2, 3, 4.
    You could as well have started at 1 and chosen i <= 5 as your condition. Or i < 6. You could have chosen 5 as initial value for i and counted down: for (let i = 5; i > 0; i--);. Or you could have counted until 10 with two-step increments. In our case the value of i only matters for the condition and we want to run the body five times.

    But this loop style comes in handy when you have a reason to know that it’s currently the nth iteration of the loop. Let’s change our example a little with emphasis on the first and last players:

    players = []
    numPlayers = prompt('How many players are there?')
    alert("Who's playing? Type in player names one after another.")
    
    for (let i = 0; i < numPlayers; i++) {
        if (i == 0)
            players.push(prompt("Who's first?"))
        else if (i == numPlayers - 1)
            players.push(prompt("And finally:"))
        else
            players.push(prompt("Who else?"))
    }
    alert('Hello ' + players.join(', '))

    The first and last player now get a special prompt.

    Try to output the player names by using a for loop.

    But what if you don’t care about if it’s the first, second, or one millionths iteration? Do you always have to jump through all the calculation hoops? Of course not. There’s another flavor of the for loop, often called foreach, for-in, or for-of. It (usually) iterates over something you (the computer) knows the size of. Let’s use it to output the player’s names:

    players = []
    numPlayers = prompt('How many players are there?')
    alert("Who's playing? Type in player names one after another.")
    
    for (let i = 0; i < numPlayers; i++) {
        players.push(prompt("Who is it?"))
    }
    
    for (let player of players) {
        alert('Hello ' + player)
    }

    This is way less to type and to think about. You can read it as „for each x of some collection y do z“ where x is substituted with one value of y after another within your loop’s body z.

    Can you do the name-prompting loop with this type of loop? Don’t worry if you can’t but give it a try.

    If you failed to do it there’s a reason for it. There’s a reason we have different types of loops – because they are useful in different situations. But still, they are (usually) equivalent. You can do the same things with them.

    for (let x of Array(5)) 
        players.push(prompt("Who is it?"))

    Here’s another one for the classic for loop:

    numPlayers = prompt('How many players are there?')
    players = []
    alert("Who's playing? Type in player names one after another.")
    
    for (let i = 0; i < numPlayers; i++) {
        players[i] = (prompt("Who is it?"))
    }
    alert('Hello ' + players.join(', '))

    We set the ith element of the array to to current value.

    When to choose which?
    Use while for unknown sizes or non-countable cases.
    Use classic for when the current index matters.
    Use foreach for limited collections, when only the values matter – or if your language supports it for key-value cases, too.

    In reality you will probably most often iterate over some known set (maybe unbounded) and a foreach loop is usually most concise.

    Bonus: break and continue
    You can either break out of the loop’s body at any time via the break keyword or skip one iteration via the continue keyword.

    alert("Who's playing? Type in player names one after another. Type 'done' when you're done. Note that no Bobs are allowed to play. Sorry Bob.")
    let players = []
    
    while (true) {
        let player = prompt('Who are you?')
        if (player === 'done')
            break
        else if (player.toLowerCase() === 'bob') {
            alert('I said NO BOBS!')
            continue
        }
        else
            players.push(player)
    }
    alert('Hello ' + players.join(', ') + ' ... and no Bob')

    Bonus: You don’t need the initializer and post-iteration clause for the classic for loop. Actually not even the condition, it’s implicitly true. for (;;) break is perfectly fine. Or for (;;); – but you shouldn’t run it. What would happen if you did?

    There’s more to it but these building blocks should make you „Turing complete“ already. You can nest loops, mostly do whatever you want inside their bodies or compute all the prime numbers … if you have some spare time. As a final task, do compute the prime numbers between 1 and a given number. It doesn’t have to be efficient, just make it work.

    let upperLimit = prompt('Upper limit, > 1')
    let primes = [1]
    for (let i = 2; i <= upperLimit; ++i) {
        let halfI = i / 2
        let isPrime = i % 2 != 0
    
        if (isPrime)
            for (let j = 2; j <= halfI; ++j) {
                if (i % j == 0) {
                    isPrime = false
                    break
                } 
            }
        if (isPrime)
            primes.push(i)
    }
    console.log(primes)

    Rabbit Hole: Look into FRP, things like map, filter, reduce, folding, streams.

  • Agile is like …

    … stochastic gradient descend translated to project management ;)
    … a Bayesian approach to navigating through the cone of uncertainty.
    … an abonnement style of building a thing.
    … checking if the cookies are well-baked every couple minutes (or if they, in fact, are still cookies).
    … switching between exploration and exploitation iteratively.
    … divide and conquer, with economic ordering.
    … watching RL stick-figures learning to walk.
    … asking how to get from A to B over and over again because you don’t trust anybody actually knows the (whole) way and you can’t remember the lenghty descriptions anyway.

  • Crypto DNA

    Maybe we should sign our DNA / mRNA to defend us against virus attacks. But we then have to protect our nuclei against malicious root certificates or corrupted keys. Maybe we will also have to encrypt our code, so that viruses and other attackers will have a hard time. But how do we swap a corrupted root key? How do we increase the entropy when our key is not secure enough anymore? And how much energy is consumed for all the crypto?

    Never forget to refresh your certificates or your body will DoS on you.

  • deno js

    Took me some time to realize: node, no-de, de-no, deno
    :insertFacepalmHere

  • How to fix broken Ubuntu 20.04 upgrade in WSL 1 / libc6 & sleep

    I recently sudo do-release-upgrade -d my ubuntu 18 LTS inside my WSL 1 to the new Focal Fossa and ended up with a broken system.

    libc6:amd64 (2.31-0ubuntu9) wird eingerichtet …
    Checking for services that may need to be restarted…
    Checking init scripts…
    Nothing to restart.
    sleep: cannot read realtime clock: Invalid argument
    dpkg: Fehler beim Bearbeiten des Paketes libc6:amd64 (–configure):
    Unterprozess installiertes libc6:amd64-Skript des Paketes post-installation gab den Fehler-Ausgangsstatus 1 zurück
    Fehler traten auf beim Bearbeiten von:
    libc6:amd64
    E: Sub-process /usr/bin/dpkg returned an error code (1)

    After some investigation I fixed it by editing /var/lib/dpkg/info/libc6:amd64.postinst – I just needed to remove the sleep from the script.

    After that a simple sudo apt --fix-broken install fixed the issue and I was able to finish the upgrade via sudo apt dist-upgrade.

  • Paragon review

    First post in a series of some short game reviews.

    tl;wr – Walking Simulator 2017

    So, I played Paragon in July 2017 and I gotta say that it bored me. You like LoL/Dota? Well, it’s like live action Heroes of the Storm. But strip the action. And remove the fun. And awesome support classes. Depending on your class and the situation you have to recall to your base to buy some cards (i. e. items) or to refill potions and hp/mp. Too bad your walking speed is so realistic that you have to take a 5 minute stroll back to the frontline.

    Aim’s not really important. It’s more like shooting with pumpguns. Actually you can shoot pumpguns in Paragon.
    The card system confuses you the first time but turns out to be just like item systems – juuust less intuitive.
    Like in most other mobas, PVE ain’t no challenge. I’ve never lost a single PVE game with randoms.

    What really annoyed me is that you can’t just unlock new characters by spending gold or something. No. You gotta log in like 5 days in a row to unlock the next tier of characters and like … 10 days? … to get all of them. After a couple matches I mastered all starter chars and there was no way to play another. I just had to wait days(!) to play some new character … Jesus!

    Definitely not worth downloading the ~20 gigs.

  • Seems like I haven’t yet …

    Seems like I haven’t yet …

    … posted my GRAVITY comic:

     

    'cause Ryan Stone ain't quite what (s)he seems to be
    ‚cause Ryan Stone ain’t quite what (s)he seems to be

  • pointless

    pointless

    it’s so pointless