Hashing It Out
Hashing It Out

Episode 125 · 2 weeks ago

Data Infrastructure Pt. 2

ABOUT THIS EPISODE

Hashing It Out continues its series on blockchain infrastructure with part 2 of the data layer.

In this episode Corey and Jessie talk to Jacek Sieka from Status

We are working with Infinity Keys to give listeners the chance to get a free NFT. Listen to the episode for the password and use it at http://infinitykeys.io/puzzle/hiodata.

We didn't really introduce ourselves, should we. They know who I am, they know me. Welcome to the Hashing It Out podcast, where we talk to the tech innovators behind blockchain infrastructure and decentralized networks. Your hosts are Dr Corey Petty, currently doing research at Status and waiting for other people to keep up. You listen to anything I just said, Jesse Santiago, a former electrical engineer now working on decentralized storage at Status. I don't know if it's if it's something that would like interesting that I talked about it and I have a lot to talk about it and with a deep voice and the deep questions de Ferguson, I can't stop looking at it, But let me stop looking at it so we can take care of business. And I'm the Hashing it Out showrunner Christian no Gara. You know these two are just out takes out the ass. Part two of our episode on Ada's sponsored zen Hey, everybody, it's the here. Uh you know I've been using zen Go. It's a wallet, scrypto wallet. We all know crypto walts have been crypto for a might a long time. We know all the crypto ways that come with the crypto wallets. But let me tell you a little something about zen Go. It's a simple wallet. You know, I like it. You know why, I like simplicity. I need simplicity in my life. And zen Go is a simple wallet. It's custodial. But you know what, just because you don't have your keys doesn't mean you can't have ease, right, So you know, I gotta say I like zen Go. I like zen Go because it's easy, and uh, I think you'll like it too, So go to zeno dot com that will do wallet, give it a shot. This week, Corey and Jesse talked to researchers stick around after the official interview for wrap up Chat with Corey, Jesse, and d Welcome back to the show. Today we are on our second episode of blockchain data and how it scales, what it is, what it's for, and today we have and tell us a little bit about what you do and where you come from. And I kind of like your background of it. Yeah. Hey, everyone, nice to be here, Um introduced me. Um Why am I here? Well, I got to know Corey at Status so we both worked there. I guess I'm part of the research group. We have a fairly big one now status. Um, I guess my background is in computer science quite obviously, but I've been playing around with um various internet technologies since since they started to change society. Basically, Um, this was yeah maybe in the nineties where you know, people started getting access to to the internet, and I think one of the big changes that that was brought about back then was was really access to information and data. So it was pretty cool to be part of. And now now I guess we're doing the same just at a different scale and when with a different model A little bit. Yeah, that's that's that's what I do. Not how old were you yasick in the nineties, if you don't mind me asking, I you know, now I'm forty what two three? So um, that was when I was like, you know, thirteen enough, let's say, um, we had internet at school, um, and then I got it at home. And but but like before that, we used to run these systems called bulletin boards, so BDS is right. So it was basically on a phone line, you would ah dial somebody with your computer and then you would download whatever information they had and and you would do this periodically, and then you had like message boards where you could send messages to people and they would get propagated throughout through this link of bulletin boards all operated on phones like pre Internet at tech um and then obviously like things got a little bit faster when when the Internet and ordinary routers took over that role. But that that's that's around the time when I started just understand the level of technology that was available at the time. Yeah,...

...back then, like you know, the Internet was you could qualite quite distributed or decentralized in the sense that you were very very much peer to peer. Like you just said, you dialed into someone, you download their things and had some level of propagation from you know, one person is molti border the next. But like I was a part of this kind of show series, we've gone through the concept of how we've now build these distributed networks or blockchain networks, and and we've found a way for us to kind of centralize that data or at least like the data source and we have the level of trust that that data is uh not manipulated or the same that we didn't we didn't get with the Internet very much. So, like we interviewed dmitri Um in the last episode, we came across this this concept that I thought was interesting, and that is this the purpose of the data and blockchain networks is specifically for verification verses, and it started with the bitcoin and that you needed to go be be able to go back to the genesis block in order to properly verify that a change and all the data was done correctly. But as we've skip out these networks and the way they're growing, now that's not necessarily the case and we're throwing away a little bit of that and so like I kind of wanted to dive into the point of data and blockchain networks, but let me start with you may be giving your your view of, um, like, how data structures have evolved over the course of blockchain networks since the reception. Huh, that's a big question. Um, I guess we can start from this from from actually again like my unit times, because this was this was really one of those moments where where things came together, right, two things, two things happen then, um okay. The first thing that happened was that the Chord and Kademia papers were written right, and what they did was that they brought to life this idea of the distributed hash table. Um. This was really like a major advance in in distributed data processing. Like before that we had hashing, of course, so we could like verify the data corresponded to some small value, but nobody had really thought about how to how to use a distributed network too to store data and like how to sign data to notes. And they were like a lot of things that came together then. Um. The other thing that was interesting was really bit torrent. I think that was around two four or five ish, right, So Karemia was two dozen one too, I think something like this then bit torrent, right. And what bit torrent did was adelaide year of economy on top of that. And it's not really economy as in in the in the bitcoin network. It's more like a botter system where you know, I give you this particular part of this particular file and I get a little bit back. Um. So those two kind of like foundational um research results or technologies. Uh lad the foundation to what we have today, right, which is these distributed networks that are governed by an economy or by this economic thinking. Um. And I think it's interesting to look at it from this perspective. Like, first of all this boter system where where you sign some kind of value to data UM, and then you also have like you know, the nuts and bolts level, which is basically, how do you distribute that data in a in a in a reasonably efficient ways so that those are like two two aspects of blockchain that are that are really really important to cover. And then there was a period of quite right, there was there was a Web two period where where everybody was excited about huge data centers. So so there's these distributed systems, they got a little bit less attention. Let's say, Um, everybody was very excited that Google could offer you know, a gigabyte of email storage, or or or maps could view like the whole world in satellite images on your computer. That was really cool. So maybe around two dozen tend like there wasn't that much happening except in the background where people like me and like a lot of other peer to peer hackers were still excited about the potential of these more distributed systems. And then came along Bitcoin, right,...

...and it put together like a lot of this past research UM and it kind of generalized um the value model of a bit torrent as well. So no longer did you have to like trade data for data. You could think about value and UM represent value in in the data model that uh, that got computed in the distributed way such that we could all trust it without having to trust any single participant. UM. So I was like when when when the second revolution came along? And then of course, um, the third revolution would have been that you added execution on top of that like the theorem did. And now now we've gone back a little bit in the sense that we've separated out the execution again and we're talking exclusively about data. Like so there's a pendulum that swings there, which in which you know, you take an idea, you you you find it a little bit, you're reach an end the plateau, you develop it again like you're reach another plateau, and like and then you kind of swing back and look at but where are we really doing? Why are we at this plateau? And and and now people are talking a lot about data availabilities that has been the the next big problem in in distributing data storage. Do you agree in which the the way that Ethereum is going about solving data availability? UM, do you agree with that direction? Yeah? I mean, UM, it does make sense. I mean you have this concept of integration right between. So so when you look at an application snack like ethereum UM, you're integrating a bunch of services UM. And once you've gotten the model as a whole to to a decent state like it's kind of working UM. The next step is really to look at the pieces individually again. And although data availability has always been a requirement for a system like it, they're a more Bitcoin or or or any other system UM or any other blockchain really UM. I think what's happening now is that we're just examining that, like shining a light on that particular problem. Like previously would assume that UM people would make data available, and now we're thinking about like what happens when they don't UM. And the second thing we're thinking about is like, UM, let's leave the execution till later. Before we can execute, we need to have access to data, Like what can we do when we have access to data? And how can we break the system when when that access is taken away? UM? So data ive like like looking at the problem in isolation. First of all, it enables m new ways to think about what happens later by creating a set of properties and and an interface for that data. Ah, and with that interface define execution like execution with the e v M is one model that you can mm hmm put on top of that stack. But you can also think about other kinds of stacks like L two's that where ah what really matters is for them is transaction ordering. Right. It's an interesting way to view it, especially like the sort of the history of it, the history of it, like I'm gonna say it's gonna say it that way from now on. Uh is the first kind of implementations of all these blockchain networks were just like how do we get it all together? Sometimes we have some semblance of a working system where it could go from computers and a peer to peer network, and the end end up with this concept of digital scarcity that we can do stuff with. And then now that we have these systems, we've realized that and we've pushed them through the limit. Like I often have this analogy of like a we've we've we built the bucket, and then we filled it up and found out where all the holes are that's speaking and we can't so we need to figure out how to either patch those holes or build or build better buckets. And what you're saying is most of the work now is identified as there's a bunch of holes in the way we treat access to the data and our ability to scale the whole thing. When we have a certain level of guarantee that access to the data is there. Once we have that, we can kind of do whatever...

...we want. And that's kind of the subject of the next part of the series is like, now that we have data, what do we do with it? Which can go onto maybe different execution models, but how do you see that scaling? Like why is access to the data a priority over what we do with it? Um? I think it's worth take a little step back there, right, uh, and and look at like why why people store data? Um? And I think we can broadly divide the world into these two different models. One is where the data has some value to the to the like person or note storing the data, right, And the other one is where it doesn't or at least it doesn't naturally have any value. Now looking at bit torrent and ethereum for example, AH, for the node operator, the data has value. They need that data in order to receive their rewards. There's also plenty of people running modes. I wouldn't call it altruistically really, but because they have a business case around having that data available. Um, it's directly linked to their applications. So even though they they're not primary users of their their data, they have an indirect use of that data being available, and as such they are they're willing to expand some resources to to process it, to store it, to distribute it, and so on. And I think here it's kind of like the Botter model in in bit torrent. Like if I have a particular file that I want to download, it's kind I like in my interest to make the other parts of that file available, because the economy of bit torrant is such that if I give somebody a little bit of peace of data, they were going to give me a piece of their data and dedicate more bandwidth to me. Um. If it's random data that just somebody wants to store on the Internet that nobody else is interesting interested in, let's say, like I don't know my encrypted backup right, there's no reason for them to be storing that data. UM. So what happens next is that if you look at uh, this kind of setup where you have a third party that is storing data on your behalf, so to speak, um, you need to compensate them somehow. And the next step then is like, if you're compensating them them, how do you verify that they're actually doing the job. Like before, we had this model where everybody was just assumed to be doing the job of storing the data because it's because it wasn't their interest and um mhm, they couldn't operate the system without storing the data. But now we're getting into now when we're starting to decompose the system into its constituent pieces, we're looking at data and isolation. So when we look at the model where the data perhaps it's not valuable to the person storing it, we need a way to verify that they're actually doing their job. We no longer can rely on this altruistic model where they have another interest to keep the data around, So instead we incentivize them to store the data and prove that they have it. Um, So we're kind of artificially pumping up the value of the data from the perspective of the provider. Right, no longer are they storing just random blobs for nothing, but they're storing random blobs for something, and instead of storing random blobs for their own interest, they're storing it for UM for a profit. Once you've created a modular system like that, you can also swap out those modules and you can think about the properties that each of these modules has, and maybe you can come up with a different UM data storage model where where where the data is intrinsically valuable to its participants. When you're building a community, for example, although the data kind of looks random too, an outsider within that community might it might hold value, So you can suddenly build networks that that that work on different the kind MC models. Yes, you bring more of that altruistic...

...sense back when you have a community. Yeah, I can't call it altruistics. It's really just that UM in a system where you don't trust, where you don't have a relationship to the service provider outside of you know, uh, the economic relationship that you established when you pay them UM, you need to have a common sort of common value system that where where where you can pay for that service. Right but within a community, Uh, those value systems are not are often not expressed in you know, tokens, money whatever UM. They might be expressed in in trust or in in in social credit instead. But the point of the exercise is that you know, not not not everybody, not everything fits under a single module model. And by tying the system apart a little bit, uh, you can enable new use cases and new models. And I'd argue that we have done that now, and this is the direction in which we're moving, because the model that we chose in the first place was not scalable. And an example of that is if everyone stores everything, you're going to end up in a scenario where people are like, like if we're kind of continuing along that line of like, the data in the system is being kept because it's necessary to operate. So I can trust that anyone in the system as all the information because they have to have it in order to continue running an up operating within the system. And that that community, if you will, grows to include a bunch of different both for various reasons, then it becomes untenable and so and that's kind of where we're at now. Like ituse like, if you look at ethereum, if I want to run a note, I run and process transactions for everything in the entire system, regardless of whether I care about it, and so that's just an unreasonable way to try and scale system to a global financial network. And so we're retooling it to try and allow different communities to care about different pieces of data or have a model where I can trust that people are storing something appropriately if I pay them for it. And so you're you're allowing for multiple different communities to operate at the same time within the same network, under different models or modules or whatever you want to call it, because it's unreasonable to press scale a network where everyone does everything all the time. Yeah, exactly. And I mean if you look at the Internet at large, I mean that that I yeah, has been out of the system since the beginning, right, Um. And what's really interesting is the way in which the Internet grows really but also the way that it can handle not being connected at all times too, all of the Internet at the same time. Right, So we have this this this idea of um, well we used to have at least nowadays it's IPv six it's supposed to be. But but the idea of the subnets, right, so on the Internet, you would have kind of like a hierarchy that where each no, it was independent until it got connected to its neighborhood, and then that neighborhood was kind of independent until it connected itself to the next neighborhood. And if if the line between those two went down, which it frequently did in the past, um like within the community, things would still keep going. And then when you re established connectivity again, like would sync up and you would have to deal with UM updates on both sides and so on. And then we're a little bit lazy because these things started working very well, too well almost UM and we kind of started assuming that we're going to be online all the time and connected to everybody at all times. And and and what we're seeing now is that some of the thinking that went into making these systems resilient in the past simply because it was necessary from a quality of service perspective, nowadays they're becoming interesting for different reasons. Right where connectivity gets limited, for example, because you're at an airport and the VPN or the firewall at the airport doesn't allow you to connect to your service when when you want to h or maybe you're on the corporate internet and again like there there there are rules to prevent you from connecting,...

...or in the case of nation states. There are firewalls being built around entire nation states that permit or don't permit certain kinds of traffic, and you might find yourself in these networks, and and and you still want to have a reasonable expectation of your your services working, especially when they're important services like communication, um, economic activity, and other things that we kind of take for granted in the modern world, at least in the convenient parts of the world in which I happen to live. So what are the directions that we're going in terms of trying to bring back a little bit of that previous concept of I can be separated from the whole for a period of time, experience or reasonable kind of understanding that I can operate globally, and then when I come back, I have access to the things that I have access to or the other side of that, and that are our resiliency is so much better that people's ability to censor me is further mitigated, such that building that firewall is much harder. Well, there are several aspects to that question, right. One is that um instake censorship as we're talking about it these days. UM one is the ability to censor somebody permanently versus the ability to censor somebody temporarily. Right. This is this is a good example. And like when you go offline, for example, because you're in a tunnel, you're you're effectively censoring yourself h for the duration of that while traversing the tunnel. Right, Um, but the harm incurred by that censorship is minimum because as soon as you get out of that tunnel, you're fine. So I think that's one interesting thing that blockchain solved really well, which is basically that if at some point you recorded a particular transaction or whatever it might be, that stays there. And and even if somebody shuts you off from the system for a while, it's still gonna stay there. And when you come back you can sort of pick up from that point and and and prove to the world that that something really happened. And um, that problem is solved very well and by the existing systems. What's not so all this is when when you want to take new action, right, and how do you how do you ensure continuity of service or at least partial continuity. And here it's a bit of a This is a little bit like trying to deal with security. Right, there is no perfect system, it's more of a cat and mouse game where you, as the you know, a person wanting to transact, usually have an advantage, and your advantage is really that you only need to get your message across once, whereas somebody wants you wants to prevent you from getting a particular message across, needs to do that repeatedly all the time for an extended period of time. So so in that sense that a symmetry in censorship is what gives us hope that we can sort of defeat most most forms of constrain and limits that are placed upon us by by various actors in the system. But that of course requires designing protocols in such a way that they m hmm. Consider this alans in who is censoring and who is not censoring, and then design like building to their design the ability for example, for delayed action or whatever it might be. But but like such that the system going offline for a while isn't catastrophic. Two, the individuals benefiting from the system, right, and this is this is one of the reasons why these the centralized systems are so powerful compared to a centralized systems, which is this idea of a single point of failure not being exploitable. You were touching upon m kind...

...of the beginnings of remote auditing to just make sure data is available. Could you go into a little bit of the differences in terms of UM the the way that it's implemented in a an appear to appear sort of way versus maybe more centralized manner UM Here, I think I think it's useful to remind ourselves again of the historical journey that we've been upon. So like long ago, UM hard drives were not very reliable. In fact, when and I started playing around with these things, we'd use floppy disks, right, and and let's say that five percent of the time your floppy discs failed, so you're sort of naturally learned to to deal with that. You would have multiple copies of things. You would come up like UM researchers would come up with techniques, error correcting codes and things so that you would add redundancy to to all systems at all levels. And then when when systems become too good like you, sort of you're no longer used to these errors happening. UM. So when we build a distributed system, we need to bring back to those lessons from from from when the hour drives were failing, right, And so what we do is that we started reasoning about them uh statistically first of all, so that if any single participant fails, we should still have like a reasonable expectation of having the data being being available without um paying too much for it. UH. In this statistical approach, UM gives us guarantees just like in the real world, like we real harder ive can fail, it's just that it fails much less often than let's say, note going offline on the internet. UM. So we need to translate that difference into UM a model that is able to deal with more failure than you typically expect in your like in your in your centralized system. So I mean there are various techniques of doing that. You I want to go into data replication. You want to go into models where um UM where you can reconstruct data from a small amount of pieces instead of having to have all the pieces all the time. UM. That's one aspect, like making it pass, like making the system tolerate failure better. And the other thing is adding this economic model on top where people are rewarded for for performing the service even though the data per se doesn't interest them um. And I think there's actually a third aspect to that, which which is increasingly becoming important, which is also this idea of plausible deniability, which is basically that when you are storing data and you are getting rewarded for storing that data, you should ideally not know what's in there, because if you know what what you're storing, you you sort of become responsible for moderating that data, whereas if you simply have no bity to to know what you're storing, the only thing, the only transaction between you and the system is that you store some data and you get rewarded for it, and then you're sort of safe from that point, and you cannot sensor the data. Yeah, and all these things you can express them as costs. Right, So if if I know what I'm storing, that's a that's a risk, and risk is associated with cost. Um. If I know that I'm storing and that something is valuable to me, that's um, that's a benefit, and then sort of the price goes down and and and there are all these variables to play with. Yeah, legal culpability is definitely a reason for why some of these systems kind of have come into being um. Yeah, this one...

...example of censorship. That's another one is like if I if there's a community that and I did a talk at def Con, and the point of that, the general concept of that talk is these systems are being built to be to mitigate assholes. That's the whole concept of centralized systems is with any community, there exist assholes, and they're going to do things that are um against the general good of the community for selfish purposes or it doesn't matter what for it's just against the general good of the community. And whenever you give them options to express that desire to go against the grain, they'll take them. And the point of your systems is to limit their options as much as possible, such that we moved from that old motto of don't be evil, that can't be evil. And I find that interesting to like win this over the years of how these networks have grown to original Bitcoin being everyone does everything, and we have full redundancy across every node in the system, and all of the data is required to be done correctly, so that we can have a strong confidence that there that if I ask any any individual node, it's going to be correct because they all have to have all the data in order to operate, and we have full redundancies such that I can kind of load balance. I can ask a few people and it should match two. And that's that's a generally unscalable system. If you think about the size of the world an amount of data we're trying to store too, Like now different models of like well, now we're not going to talk so much about everyone needing all of the data in order for the entire system to operate, and I have to think about how to kind of incentivize them as well as mini their ability to be an asshole within the entire system. And I think that it's just it's generally interesting that that's kind of the direction we've gone. And yet we're still bringing in ideas that we were learning about when the underlying hardware was faulty. So like that's that's our way of kind of fixing it is using lessons we've learned a long time ago. Yeah, And I mean a lot of these systems were built on the assumptions that people would not be assholes and using them right. We the Internet used to be not encrypted, um, which certainly was interesting, but it also enabled the level of exploitation that that that that would have made the system unusable had it had it allowed to be continued like had it had something that changed and what changed from from the unencrypted Internet was that we changed to an encrypted Internet. And what we're changing the Internet to now is like an encrypted and verified Internet where not only m H, not only are you excluding the ability for any intermediary on your on the path between you and your service provider to snoop snooping on the um communication, but you're also also asking your your your provider to verify that they've carried out ah the task that you assigned to them correctly. Right, So in the Internet before you would send data through five or ten notes before it got to its final destination, and you'd kind of trust that they would forward the data correctly, they wouldn't tamper with it, they wouldn't use the information they're in for for other purposes um. And we saw that this wasn't quite true, so we started encrypting everything, right And then the next step now is that we also started verifying what the service provider is actually doing. And and bitcoin and the theorem there obviously one way of doing that, just extremely expensive, right, we're, like you say, we're having everybody verified that that that things were done correctly. Um, even if it's just my you know, five dollar coffee cup transaction at the local coffee shop, which is of no interest to anybody else but me and that coffee shop. Stop trying to think about how to structure a question around how does it Where do we go from here? Because to general generally verifiable encrypted computation of transactions. So I think the...

...way we're going is probably modularizing data and encrypting it and making sure it's there, and then also modularizing the execution component to be general as well. Right, So like that's kind of the order in which you Astick was talking about. So right now we're focusing on data, and I don't think anybody has come up with a solution where not only do you verify that the data is there and not distributed and it doesn't have to be naively replicated amongst all of the machines and the peer to peer network, but once you kind of optimize storing data that is incentivized by the user and encrypted, ah, then we can go ahead and maybe begin talking about um retrievability guarantees as well, because that's something that I don't think in terms of performance. I don't think peer to peer networks, UM, at least from what I've seen in this space, they don't have the same kind of performance that kind of more centralized architecture has in terms of delivery of content, whether that be movies or you know music. You know, there's no real um, I don't know a platform that can say it's you know, purely decentralized in terms of its infrastructure and it offers all of those guarantees. Oh that's actually there is an interesting story around that, which is basically that, um, around the time when when when BitTorrent was being invented, there used to be all these it's firing networks, right and and and in particular in Sweden where I grew up, huh, Kaza and lime Wire and all of those Yeah yeah, yeah, yeah, it was Napster, it was direct connected, was kazans one. But because I'm particularly yeah, a lot of them, right, um, because I in particular I think it was it was run by a couple of dudes from Sweden, and they later went on to great Skype and the initial versions of Skype. We're actually peer to peer, and this was really interesting because that was an efficient way to grow Skype. Like if if I'm setting up a conversation between h two nodes, there is no reason why I should be contacting, like like all the traffic from all the world should go via a single server. So what they did was that they established a peer to peer network where they used pretty much the same technologies that they had the developed during the fire sharing eras UM. So these things that started, as you know a little bit, these underground networks, they found their way into commercial products as well, and like established products, and then what happened next was that probably Skype needed to get into content moderation and therefore they centralized everything back again. UM. But this brings us again to this topic of verifiable and encrypted right them the moment that you gain this control over the system where you can moderate the content, that is becoming first of all a risk, like we see like a legal risk, let's say the gdp R and all these other aspects were the mere fact that you're holding onto some data exposes you to risk. Um. And then for me as a user, if if I don't have to expose my data to these service providers, why should I, because it exposes me to risk as well. It's like it exposes me to the risk that somebody, let's say it's an economic transaction, that somebody will front on me, or that somebody some entity that disagrees with what I'm doing UM will try to answer me. So just like before where you know, the Internet in general went from unencrypted to encrypted UM, I think that's like going forward. That's the journey that will be seeing for more and more systems. Blockchains in particular, where transactions are private by default like z cash, pioneer this UM in many ways UM. But now I think one of the big...

...things happening is really all the general purposes ORO knowledge based execution environment. And what's interesting is that the thing that all these environments need exactly is an underlying layer of data storage that is reliable enough and where the participants actually know nothing of the data that they're storing. I think this is it might be interesting to think about this in the frame of UM, like data at rest and data in motion. And for the good portion of the history of focus for adding encryption or like removing people's ability to UM tamper with things has been focused on data at data in motion. So like we added encryption to the Internet because we added people's credit card members to the connections that people were making the stores where they're buying e commerce stuff, and so the ability for someone to steal that information, which is just a series of numbers and then use their credit card was way too easy. So we needed to find a way to fix that so that people couldn't snoop on these connections and and do something about it, or at least like the number of people who could do it was was was mitigated. And now we're at a point where like, all right, well, like now that that stuff is being stored somewhere, that's now those are now honeypots. The information as it's being stored is where the manipulation is being happen is happening. And now we're focusing on that aspect of UM. Now that we've got better ideas on how to encrypt data and motion, we need to find way better ideas of encrypting data at rest m hm. And I mean really one of the best ways is is to not store it at all, right, not store the data, but maybe store a representation thereof or as many of these systems, these these these emerging systems work. You store a proof of the data, right, or a proof of a particular aspect of the data. This is this is what their knowledge is about. Like you, you no longer show everything to somebody that's interested in only a part of like what you know. You just prove to them that you know or you have a balance or credit card belongs to you, but you never really reveal the details of it, so that they can never use what you gave them, right. And that way, UM, you eliminate a lot of risks for a lot of participants in the system. And and this is probably also why why many of us are so excited about it, because like, we'll get back to this world where where where we can transact without having to worry about um the assholes as you put it. That's another like that opens up to something I wanted to get at them somewhere running out of time here is um. An alternative method for mitigating assholes is like self validation. And you know, you work on a few things in this area that those people to not have to run heavyweight notes in order to get the information that they want. They can ask anyone they want and then verify it independently. You talk about that kind of alternative model to things a little bit. Um, Yeah, briefly, I guess. I guess many of like a lot of the technologies that we've developed over the past a few years, is about compressing information down to something that is easily verifiable and easily very viable changes over time, right, and and like at status, we have a couple of projects going now where um, for example, your wallet balance, very simple thing, right, We're we went from this model where um we would run a full ethereum note on the phone, but then ethereum grew too large, so in order to show the balance, we had to use a third party service to view it to get it um, and there wasn't really a good way of verifying it. And now what's happened recently really is that we've been able to find a compact representation of something that we can verify that a provider of this information can provide and it's very very light weight. And this is kind of like the holy grail of of all these systems that with a...

...minimal proof that you can verify reasonably on you know, commodity hardware, you can arrive at an answer as to whether this is really what what what the community agreed upon or not, And there are many techniques and technologies involved in getting there. But mm hmm, I think this aspect of really really everybody being able to do it on normal devices is important for democratization of of of these systems. Like if the moment that we build systems that are so complex that you need specialized equipment, hardware UM access in order to verify them, we've we've kind of lost lost diffigent purpose of doing it. So this goes back to not having to trust the provider. Right, we still have providers in these systems and they still provide data, but UM the challenges to find representations that people can trust, say a cryptographic proof that is simple enough that they can reasonably model it in their head UM even if they don't understand the details of the math. But like it has to be approachable, explainable to a five year old UM, and it has to be generally available, Like, it can't be too heavy to too expensive to do. This may be something that you both know, it may be a dumb question, but in terms of UM compression, how is that done um in ethereum to to give you know, um like a lightweight verifiable proof that that the transactions have been I guess batched and and uh and attested to. So there's several pieces to that actually. So if we go back to the balance querry right first of all, UM, one of the foundational technologies right now and use is mercle trees. So a mercle tree is basically a tree representation of a bunch of data that leads up to a single value that represents all of that data. And that single value is compact. So your balance, together with all the other balances and account state and so on, is stored on every Ethereum note and every theorem note can generate a tree of hashes that represents that data. And if you take if you pick out the right notes in that tree, you can verify that balance having access to the route the tree. So in order to verify the root of the tree, UM, you need to know what the most recent state of the network is. And the way that's done in the theorem right now is through proof of steak on on the beacon chain. UM. So there's a bunch of validators. They come together, they look at each individually, looks at all the information that's been processed from their own point of view, and they select a particular block as being the current head of the network, like the most recent state UH. And the change that happened recently in the theorem that allows us to verify state data, this this Merkel tree is really that we added UH something called the SINC committee. And the SINC committee is simply a random selection of validators five hundred of them five and twelve, and they vote for a particular block as being the head of the network in every block. So with the delay of roughly fifteen seconds, we can know that a representative subset of all the validators considered a particular state to be the valid one. And we're playing a probability game here, like if these five D and twelves say that this is like the latest state of ethereum, the chances of that being not true is very low. And at the same time, five D and twelve validators or or public keys. Really it's something that any device out there can le inch like reasonably. It's...

...almost that you could do it like my hand. UM. So there's this um balance between how how much data you need in order to verify and how much security you get out of it and these ah Sinc Committees, as they're called the five twelve validators, they represent a particular balance that is quite quite reasonable. AH So these five twelve days they sign onto particular block as being the head, and that block contains the root of the tree, of the Mercle tree. And therefore, if we ask the provider to give a balance and also what's called the Merkel proof, which is a part of the Merkel tree needed to verify that particular balance, and we can compare what the validators thought and what the provider UH is trying to prove to us, and if there's a mismatch, then we can highlight that to the user, right and then the user can take action. At least they can know that something is certainly part of the consensus right now, or something is not part of the or at least it's unknown whether it's part of the consensus and they need to perform for the verification. I think it's a great way to wrap up kind of this conversation, like a practical application of what we're trying to do now to give reasonable access to this type of stuff and the way it actually build these networks to provide some level of guarantees that something was done across the distributed notes that has a much potentially has a bunch of assholes in it. So thank you for coming on the show and helping us kind of walk through that. I think we got a pretty good picture at this point. Is there anything else you would have liked us to ask that we didn't? Oh, you know, there's there's so much going on in this space right now, right, we could like talk for hours about it. Um, But I think this journey of of how we went from an untrusted Internet to kind of like an encrypted Internet and now verified Internet, it's really representative on a whole lot of levels, Like even at this banal level of you know, looking at my balance in this in the app, Um, it's it's really the same steps that we have to go through both at a at an individual level and at a systemic level, right and and and and remembering the individual in these systems is to me at least very important because that's why we're building these things for well, for ourselves and for for individual It's not for for the system too to work, but but but rather for people, right and Um, it's important that we build systems that people that don't necessarily have all the knowledge, still can trust in use without being abused. Hashing it Out is working with Infinity Keys to let you claim a free listener n f T for this episode. You can find the Hashing it Out challenge on the Infinity Keys dot io puzzle page, or use the link in the show notes. Enter this week's past code fluffy, that's f l u f f Y, then claim the n f T on ethereum, avalanche, polygon or optimism. Yes. Um. It's interesting. H In the organization, he's known as like a Trent Can. He talkst slow and it makes pauses, but most of the time when he said something, you want to listen to it because it's usually important. And I think this was a good example of that, Like the way in which he explained the history of things and how it has led up to where we are now and kind of how we go from here to figure out how do you like what the purpose of data is in a blockchain and where it goes and who cares about it? I thought was like just a very good narrative, and I think it for me, it's probably gonna change the way I talk about blockchains in general. When I'm explaining it to new people. Yeah, it was nice listening to his historical take on how he was a teenager in the nineties and then I was using UM like message boards, and then how I guess they changed over the years and two more dynamic messaging services that are centralized, and now now we're kind of coming full circle in terms of trying to make distributed networks have kind of the same performance qualities that these central lies messaging UM providers have, but in a more modular...

...fashion, and we're kind of like abstracting the different pieces of the system. Like you know, data was was a component that he talked to and how it might scale by by way of UM taking it UM and I know, like he he mentioned that it would be it would be better for the data to not be replicated naively by everybody within the within the system. So like for instance, current it used to be that before like light clients like Nimbus, you had to run the full client and have the whole blockchain SYNCD up, and so now you have the possibility of using like Markel proofs, the ability to kind of run a client that Ah attests to was it, how does it work? Corey in terms of the at testations and then the same committee of like a smaller subset of the entire validator. Yeah, so there's a bunch of there's a subset of all the validators that basically like come together to say this is the canonical block and provided an additional feed for people to then verify. Like it's like right now, when you use uh something like infuria, you have no you have no knowledge or guarantee that what the data they're giving you is the right data from the blockchain. So you can take the small feed directly from the beacon chain and cross reference it with the inferior information they give you and have a very strong confidence, like very very strong confidence that the data is correct. So you no longer have to trust providers. You can just attack on this little bit of validation that's provided by a subset of the beacon chain validators to give you that. So this allows for light clients, you no longer need to run a full note to have the same level of security or basically the same little security in the data you're getting from anyone. I had a hard time trying to figure out, like what incentivizes people to hold other people's data that was a really interesting part of the conversation, and it's it's so then I started thinking about my own experience as a pirate back in the day when I used to pirate music. Um, sorry, Ludicrous, your album was free for me back in the day, sue me, please don't um. But the incentive was to have Ludicrous album. So then I would open up my computer, do it, download it, and then I would see some stuff so I could download stuff faster. It was like a game almost is like how much stuff can I allegedly by allegedly pirate back in the day. So I mean, like, what's the incentive for me right now? I think the incentive is there, Like if somebody's like, hey, man, I want to pay you five dollars to open up, you know, a hundred megabytes on your hard drive, and I'd be like, sure, no problem, five dollars, what a month, a week? Like, what are we talking about here? It's just storage, just digital storage. It makes a lot of sense. But as I heard Yasick, which by the way, Yasick you need a teacher and says, yeah, I'm sick, Like what somebody doing the piece on down? Um, you need to uh, you don't have to make that face. Christian was an excellent joke. Uh, there needs to be a better connection between Like it's hard. It's hard for me to to like articulate. What I'm trying to say is like, how is the problem not solved when I feel like it's been solved for a long time. I guess that's what I'm trying to articulate, Like I'm trying to understand. The difference is just from you know, lord of the GPPs. Here, I have Google Cloud. I can go anywhere. I can use that. I know that it's not just on one server in a Google Cloud bunker. I know they've got servers everywhere, bunch of them just bouncing that. I back around, back and forth, like why can't that be? Like that's the big recipe, but I'm not trying to make a big pot of spaghetti, trying to make a little pot of spaghetti. So I need to cut all that ship up into smaller pieces, Like why can't we emulate that on a smaller scale, for just a node? And then boom, shaka laka, Like obviously you know I'm a lot I'm missing some parts here, but it just feels it just feels like okay, well, I mean they've obviously got like servers all over the place, the technologies there. The whole point of this space is...

...to not have like a singular entity that controls your data. Though. That's that's the whole point. Like, like, nothing about security and privacy is easy, and so so people are trying to make solutions that are equivalent in terms of user experience but provide the same kind of guarantees that you own your data and you at least it's distributed amongst you know people like you know, average shows running nodes versus you know, Google or Amazon or cloud Flare or whatever whoever when you open up. But you're like when you get that that that ticket that says like, hey, can I use five dollars for automagabytes your hard drive? You gotta make damn sure that you don't get to know what's in that hundred pagabytes. And and if I ask you, you have to be able to prove to me that you're you're holding it. So you did you say hey, thanks for that. You just don't just don't do it. Uh, And so you in order to give those types of guarantees, you have to distribute that data with redundancy across a network of people doing this work is to do it in ways in which they don't get to know what they have, but they get to prove that they have what they're supposed to have. That way, like there's no way to censor any of that information. Yet people who are contributing to the network are being paid commenced to how much they're contributing, and that's really hard to do. Like it's it's easy if you can see all the data and you can pass through it and then you you're doing it all in a centralized way. That's what other straightforward. But what we've seen is that when entities that do that don't necessarily like what you're doing, they take your data away. You know, you longer have that access and what you said beforehand, which was I think the crux of this, this transition from now to the future of decentralized data is like with early blockchains, you trusted a note gave you the right data because they had to have it correctly in order to run the note in the first place. So the process of being a data provider means that you need the data. But now with the kind of like the new newer solutions and things like file coin or data data distribution services. They're just running a data provider service and don't necessarily care about it. So you need to incentivize them better because the likelihood that they do it because they're gonna basically gonna be more greedy. They're no longer are running a note because they care, running because they're making money, and the data that they're providing has nothing to do with the process of running a node other than making money. Then you have to do it in a different way such that they don't get to cheat, and that's really hard, and you have the same level of guarantees or like you just say, hey, network, given my files back, and the network does it automatically with a bunch of pinch potential malicious people inside that network trying to stop it. Mm hmm. From from the point of view off a user, though, I agree with you d like they just wanted to be cheaper and better than what currently exists and easy to use, I think, right, is that yeah, aspect of it. There's another aspect of it that I that was opening such a huge rabbit hole that I just kind of noped out of that thought, and that's like, how do you protect this legally? Like like that's literally why I said, Like, the moment you're able to do content moderation, meaning you can see the data and you can decide on what you'd like to do with it based on what the data is, is the moment you're legally liable to handle that data appropriately based on the jurisdiction you're in. If you can't see it, you have no idea what it is, and there's no way to prove it, then you're no longer culpable for holding that data. You're just providing a generalized data service I think, not financial or not legal advice either, because like, yeah, are even even the lawyers that status are looking into this right now? D In terms of like if you have a piece of data that is, let's say something like childborn right and chose that one and you can't the whole image, are you still culpable? And you know, maybe maybe you are in a in a future in like ten years from now, maybe the legislation will change to say that, yeah, if you if you're holding an encrypted started piece or like a racio coded piece, or contributing to the storage of some upset media. But currently currently we're playing in the gray area where you should not be probably right. So well, we're trying to make systems that move from don't be evil to can't be evil. I cannot be evil if I don't have the process of I don't have the ability to manipulate the data. I can't if I don't, if I can't choose things that I have less things to be able to do. So that becives stronger guarantees to the users, because like...

...there's this huge pipe network of pipes and pipelining of the user interacting with the system, and the less we give the middlemen options to do things, the more guarantees we can give to the users, like, oh, it's not going to be altered because they can't, as opposed to just trust them. They're good guys. Okay, good chat.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (127)