Hashing It Out
Hashing It Out

Episode 124 · 2 weeks ago

Data Infrastructure Pt. 1

ABOUT THIS EPISODE

Hashing It Out continues its series on blockchain infrastructure with the data layer.

In this episode Corey and Jessie talk to Dmitriy Ryajov and Eric Mastro from Status.

We don't really introduce ourselves, should we. They know who I am, they know me. Welcome to the Hashing It Out podcast, where we talk to the tech innovators behind blockchain infrastructure and decentralized networks. Your hosts are Dr Corey Petty, currently doing research at Status and waiting for other people to keep up. It's a good carnivore diet because that's as in California. Jesse Santiago, a former electrical engineer now working on decentralized storage at Status. Because I have no idea, I tried to look through it and with a deep voice and the deep questions Dee Ferguson, We're gonna tell you about how we went on a journey for slices of pizza and ended up in a strip club. And I'm the Hashing it Out showrunner Christian no Gara hate those Continuing with our infrastructure series, This is part one of our episode on data. Please part in the background noise and nasally voices and Mike issues. Our hosts have been busy going to deb coom and then getting COVID again. Who are you and what do you do? Actually? Bring errog in this as well, So like yeah, I guess a quickly kind of overrun of like, um, like who you are and what you've been up doing, like what you do today? Sure? Uh So my name is Mitri. I've been working on centralized and technologists since Ground twenty seventeen. I first started contributing to Profession B two B and then I moved on to work with Metal Masks on a project called Mostakala, which was an early attempt to create a light client by sharding the tear of State. UH. I then moved on moved on to Working Force Status and I wrote an implementation of the b twop in mean that is currently used in UM. Would you say an interium client also produced by Status into plans produced by Status uh? And now I'm working on a project related to a storage and data durability UM centralized storage and data doability. Yeah. So once again another person that works with me, Eric. My name is Eric Astro. Uh. My background is mostly in the web two side, building websites and dot net and all the awful products have come around that. Actually I like dot net share UM. That's where I actually worked with Jed who owns co own status and go found the status and yeah, what's uh, what's thethereum who's getting popular in started tinking around drop script side, just playing around with little here, and they're investing a bit, just getting interested in the ecosystem and eventually wanted to get involved in the in in the you know, officionally. So that a job with status working on embark, which is now pretty much dead. It is kind of like a Trumple alternative. Um. And then from there we transition to the desktop client team, which is still working progresses, data releases, coring out hopefully um. And then yeah, about a year ago of the lesson or maybe nine months ago started working with Nutrion codex Um currently working on the marketplace contracts that you know, buying and selling storage distractions there. So, like, can you give a quick backdrop of like the concept of data and blockchain networks and like where it all kind of started? Sure? Um, So bloock chains operate on everyone in the blockchain potentially shares the data and this dator is required over it. That's the current model at least, right Uh. And this is it's been a us by different solutions, sharding being one of them, for example.

I'm sure there's there's there's an other ways. So that we're dealing with this um, but that just this that everyone had to copy this data over and over and over again, and by doing that it added security, but it also made this the blockchain scalable. A blockchain essentially continues growing in in size even if you drop blocks or the history or the locks or anything, whatever structure you have in your particular implementation, you're always gonna end up with more data accumulating just because you need to keep some state around UM. And the more usage you get, the more it grows better. Thought was interesting because like when we started with kind of the bitcoin vision, right, is this a really cool way of saying, like, well I can I can trace back a piece of data to its origination and just have to keep tracking the blockchain and the growth of that blockchain. It seemed to be in line with you know, hardware trends going forward, so like keeping track of this big thing was reasonable things to do, and the bitcoin that's still reasonably insial thing to say. When we switched to ethereum and we generalize the type of information we're keeping track of that growth in data trying to get really big, and then you start to think about like the different networks also doing the same thing, and the growth of those things, like the finance chain for instance, because it tries to be so fast that growth is almost unmitigated. So like, is that a reasonable thing to try and do in blockchain networks? And why do we try to do that? So the question is whether we need to keep this data around for block chains? Where that comes comes from and why do we need right? So, so partly it's because it provided security for for the blockchain and there was really no other way of doing it. So when you started to distribute this data across many other notes in the network, when you take a blockchain and try to break it apart, things start going wrong because you uh start sacrificing some some implicit security guarantees that we have in this in this data structures. So in order to do that, you can't just naively chop up blockchain up into pieces and allow it to run um. And obviously the growth of blockchain um increases with its users. So the more popular or the more useful to blotching. Yes, the more data generates, the more data you will get. Is there a particular reason for the structure of blockchain data? Like why why do we make it, Like why is it a blockchain? And why do we put things in blocks? And like what what what is the benefit of making this data structure of this particular way. And we're kind of actually also some of the consequences of that choice, uh as it as it grows and we try to access them. Yeah, So what we're trying to prove essentially is um sequentiality over events. Right, So this is this is what's going on. Uh, and this is why the block chain structure was chosen. Uh. And this is the fundamental property on on on top of which uh, just the tompstis of the blunting come from. Um. What was the second question? Yeah, like one are the consequences of choosing that structure, Like, like what now that we've basically decided on we all come to agreement using consensus, we all kind of replicate the data for us all the notes based on the based on that agreement, and we build this massive, this massive data structure, right, um, And we've come to a good way of doing that based on how we can do that continuously for a long period of time. For the consequences of that now that we have this data structure, um, and said what are the consequences of scale of access of like what now that we have this, like, what do we do with it? And in terms of like I'm trying to think of like as we scale this out, what's going to become a problem as these things become large and when we try to use them,...

...it is like does it get worse? Is it get easy? And why? Yeah, so I think it's gonna get worse. They I think it's gonna get harder, harder unless we come up with fundamental solutions to this problems. Right. So, again the structures where they're incredibly powerful and useful, but they have limitations in terms of how they scale and also the assumptions that we sometimes attribute or the qualities that sometimes attribute to these data structures you don't necessarily hold for that for that long. For so, for example, you have the historic state of of of a blockchain, you don't necessarily need that entire h block block history in order to run and blockchain. Right, this is the fundamental realization of example um ethereum, etcetera. Right, So the idea is that you don't need the state anymore because um it's not fundamental for the security of chain, and you can essentially start moving it off. And if there's any sort of if the blocks are needed for you know, functional provide functionality to your application or something like that, you can always store them yourself, or you could outsource that to a third party. So we are kind of moving away from this assumption that this block chain structure requires all the blocks the entire history of the chain in order to provide security guarantees um, and we're moving into a world where we were block chains are not necessarily storing all this data. They're trying to focus on so going to one particular problem, which is uh, coming into agreement on a particular value, which is essentially consensus, and you kind of living this problem of data and all of that outside of the product and allowing other solutions to come up. We're coming in and try to deal with this problem so that some of the like the scaling solutions are literally saying other people, Yeah, make whatever data you want and use the underlying consensus networked to gif you provide you some level of security. How does that? Like? How does that work? Like? How does that? How do you get security guarantees by chunking about aggregating bunch of data and then leveraging some consistus network to give you the level some level of security. Yeah, it's it's also a so their new techniques coming up coming up right, so verifiable computation with the genledge groups and things like that do allow you to do this in a secure, in a safe way without sacrificing sacrifice and security. Right, So you could make an argument that this, this chain of blocks was required before we have this technology. Now that we have this technology and disability, we can we can safely kind of forget those ones um question. Yeah. Yeah, aways thought it was interesting too, Like the process of coming to consents, Like what we've optimized for when we um make blockchain networks and build blockchains is for the consensus mechanisms. So when we're building the actual data structure, like the way we pass ourund information and the way we build that data structure is specific for consensus, which has this interesting trade off of when we actually try to access that data and how difficult it is and terrible it is to like query it in some specific way. So it also becomes kind of this interesting fun way of like that I think is only come up in the success of the theoryum as we got larger and larger and larger. How difficult it is to build applications that use this information. It's it's interesting that like, as we've built this thing out, which we found to be useful and that you have like a scarce resource to put data, but in the process of building it, we make it really hard to query and we also like fill it up really fast. So we have to figure out like, all right, what is actually good for? Right? Like what is And then you have this interesting like concept of like if...

...everyone uses the same space, what is the appropriate use cases for using it? And what do you do with the ones that, like when a leverage blockchain networks but don't necessarily like can't fit into that that small scarce block at a given time togram? Right, So what are we doing, like how do we how do we how do we fix that? Now that we basically stilled these blocks up and we're starting to narrow out those use cases that can't sit into them, right, Okay, um, I think there are several use cases here, right. So if you're you're speaking specifically about a blockchain, this data that is being sort of discarded or the core layer saying I don't care about it anymore, it's still necessary for some use cases, right, So essentially you rely on their body providers to store this for you. And then the the use case for specific applications, well that's very varied, right. So um, again you can use centralized search for providers, which you probably don't want to do, or you could use the centralized search providers which are coming out but are potentially still not on par let's put it like that with centralized solutions. So we're in a beautiful kind of hard place right now where we have um, the use of just coming going up. There's a lot of few cases, but there is fundamental problem with where that's the data go hasn't been really addressed. Yeah, that's an interesting one because like you look at like zero and layer twos on the theorium, they have somewhat of a data availability problem after a given time period. A lot of the like the way the scaling solutions are going is UH will provide you data like data availability for a limited period of time and after that it's up to you to figure out what to do with it. And so like as you context you aalize applications and the data associated with those applications, like the base layer protocol, like the actual thing that's like giving you all the security is saying we'll do it for a little bit of time and then you've got to figure it out. And I don't think we've come to conclusion yet, and like how that gets fixed? What do we do with Layer two data? What do we do with Layer too data? Okay, that's a great que So like, yeah, we have the Consensus network now is let's just take a theorium as an example. Yeah, the Consensus network is providing similar layer of like secure data availability and for Layer two to just dump all their ship into and now like once it's gone, what do we do with it? Like how do we how do we manage that that extreme data growth? Uh? And then like what are the current layer two is thinking about? Kind of figure that out? Yeah? Um, well, what I've heard is I think this might be this might be true. And what I've heard is that Layer two's are essentially saying that they rely on the security provided by at once and you can centralize the story of this data as much as you want, you know, as long as you have l ones to provide you with this data availability. Um, you're fine. I mean that's the narrative that I've I've heard, but obviously l ones are also saying we're only going to be providing this for for a very little amount of time. We're talking about essentially days for weeks, right, that's short. I thought longer it's it's uh. It really comes down to um capacity, trooput and being able to run these things on commodity hardware, on on respervising. You don't want to um, you don't want to increase the hardware requirements a point where it becomes a centralized attactor. So we could build these things with a lot more crew, put in a lot more capacity, uh, and rely on beef your hardware...

...requirements. But um, we don't quite do that because it obviously has centralizing. Let's actually dive into that because I thought I've always found that interesting and this is part of that kind of narrative, like how this stuff scales that because like for the longest time, people are like, well, Ethereum does thirteen fifteen seventeen transactions whatever is doing for the longest time, that's not enough. And then you have this kind of proliferation of networks that were trying to just ramp up transactions per second and the consensus layer, and you're saying, like, well, we're fixing the problem. We're making these things scale by increasing the number of transaction through what we can do. But in reality, which you found was like one of the main bottlenecks that just was the io you had to hit on the devices that were actually running those And so if you try and sync a network like in the different ethereum, you end up having to have resources that have you know, incredibly fast acts storage access which in dame drives and it doesn't matter. So when you look at like one of these attempts to like see all that throughput, like the financial chain, you can't keep up with. You can't start to sink to the network and ever get to the head of the chain because you're throttled by how fast you can write data to the drive as you're sinking the chain. And so you end up with us like weird scaling issue that we didn't originally see based on hardware limitations, which like you said, adds a lot of centralizing effects to who gets to contribute to these things. So like, is that something that you think networks are taking into account now, and like what effect does that have some networks are others are relying more on just more capable hardware essentially, So it depends on I guess what your priorities are and who you're catering to, right so, um, but yeah, I've seen both essentially, um and with the TRAM in the TRAM scathe, ah can is trying to be more conscious on the hardware side and avoid this sort of centralization to due to too uh performance. It's probably found that interesting, like what is the what is the like what is a reasonable optimization here? Like how much how much uh requirements should be enforced a given like consensus note operator to have in the process of contributing to a network, And what's like what's reasonable there? Yeah, uh, that's a good question. Um, So it depends. I think this comes down to specific specialization of roles. And I think this is yet another thing that we're seeing happen in for example, it here, right, So there are trade outs that are being made, for example, the opposer build inspiration. Uh. The tradeos are that that you have a proposers that could be any you know that they don't require any special hardware specialized hardware, but the build there of the block with asim to be running on U industrial type hardware, right, So, um yeah, I think this is one way of so many Um whether that's gonna be, whether that's gonna work or not, I'm not sure yet. I get better at accessing it, Like how do we because like like I said earlier, we're we're building these things in such a way where we're maximizing our ability to add to the chain and increase that group. Like basically I can start gets into the chain quickly and then start participating in consensus. But the consequence of that is that if I'm trying to extract data from what's been done historically, it's really hard to com or something. So it's hard to build applications on top of these things. Like what are ways in which we can fix that or what people are trying to do to address that? Um yeah, this comes down to again too, where does the data data go? Right? So I think that the the problem of thinking up to the chain quickly and all that that can be addressed relatively week And there are solutions out there. I think me nice...

...one or I'm not sure they still call themselves like that, but there there's definitely attempts to UM essentially chop off history using their knowledge grooves and be able to then sync up from a from a UM from a checkpoint. Very very good, right, Um, that's one solution. But in order to like continue building applications that are useful, you still need data in this data needs to exist exist somewhere, and it needs to be available, and it needs to be uh you secured and persistent. Any need to have guarantees that that that data is there's rely on it. Right, So like get into consensus doesn't necessarily require all the data uh. And I think like the L one data availability or guarantees UH that is providing to L too. Also potential eventually the UM off chain I'm speculating right now. So it gets a little bit more that, but it is one possibility. Um. That's weird like that, I'm trying to think, like, h, we start back from the original narrative of of bitcoin, right, It's like, Hey, we're building this decentralized network where I get to track the financial history of everything that has ever happened, and that's a good thing. Uh. To where we are today, It's like, what's the point of the data now versus this original history of like, I have the history of everything that ever happened within this network, right, and so like there's there's clearly like a drastic shift from that narrative to what it is today. And is there a reason to ever try and get back to that. Well, it depends, So it depends on what the use of the chains are, right. And in the original case, it was purely financial, So the only thing you cared about is that your balance is great. That's really the only thing you care about. And in that context, like the amount of data that you generate, it's not necessarily that large, and you can poventually figure out ways of dealing it, but it's still the minimum amount of data that you need for h and to be to be useful. Uh. In change that are more general purpose change like interium and others that have potentially execution layer uh, and they intrinsically generated a lot more data. You can deal with this in a night way. You need to come up with some clever solutions for this, unless you want to centralize everything. Right, So you have this st like like yeah, the conversation of like is this important? Does this thing to live forever? Becomes a lot more broad and generalized when you have a bunch of different use cases happening, or like, it seems reasonable to want to track the history of all financial applications within a subset like bitcoin. But like when you start dealing with ian, maybe like I don't need I don't need I don't not purchase or coffee purchase to go on chain and live forever. I think that it's like that that's a privacy thing, right, So, um, you probably don't want to track the history of all financial transactions, but you do want to be certain that the balance you have in your wallet or in one chain is the correct one. That's an interesting that's an interesting distinction, right. So, like the data structure itself, what we're aggregating is validity that something is true versus like actually being being able to see the transparency of that thing happening. So in the beginning, when you started, we started a bitcoin, we didn't have the ability to um ob skate the details, so we needed those details in order to give that validity. But we're actually it's looking for validation, right. So now like as we move forward and like that type of understanding is starting to come into play within coming up with techniques to just focus on validation of data for a given time period and nothing about the thanks' south Aerica, just like I mean, from the way I...

...understand it in this public but probably But if you have a value and state and you're like, okay, I trust that value because you got it from a touching checkpoint totally, and if you really wanted to prove that that data exists all the way back from like all the way back from the beginning to up over time, you need the data for all the transactions. I'm just trying to answer the question directly about why do we need that data. I mean that that's why we did it that way. That's why we did it that way. And so for L two is sort you're saying, like you roll up all the transactions, roll up for example, roll roll up some transactions. They're like, okay, yeah, this is trusted the no one someone someone wants to challenge that you need all the data, do that and prove that's true. Well yes, and yeah, exactly for for it depends again, yeah, we now have the technology to kind of not needing that. So zero knowledge proofs which really not necessarily about your knowledge in this case, they're more about verifiable computation allow you to kind of bypass that, right, So you don't need you you used to need that history in order to be able to verify it and then say, yes, this is the correct tip of the head of the chain right now at this particular point. But you could now by pass it, So you don't really at this point, you don't really if you bypass it, then h are you relying still on assumption somewhere of truth. Well, that assumption of food comes from the You still need a little bit of history, right. You probably do this in checkpoints, uh, and you'd still keep this history around for sometimes. But you can then generate you know, weekly checkpoints and drop this and generate this in a way that it's rely on the same intrinsic consensus of the change. That's interesting because like the way in which we're trying to give strong confidence and full to date is differentiated by how they're like, so we're just looking talk about roll ups for now. The ethereum ecosystem is the general mindset that we're talking about right now. But like that shift of uh, I need to track everything that goes on in a blockchain too. I need to make sure that I can validate everything that goes on the blockchain. That's kind of where we're going, and how you do roll up dictates what you need to do on the blockchain, like the layer one data structure to give that, and so optimistic roll ups require you to have basically all the data available on the blockchain for a com period of time, such that anyone on the on the roll up and say that's wrong, I'm challenging it, so I can leave the ZK proofs are like ZK roll ups are doing it in such a way in which, like you're you're putting the proof there. So the actual process of aggregating all that data, generating the proof, and then putting it into the blockchain and validating it on the blockchain, you don't need all that data. You just need that the proof to exist, because if the proof exists, you can trust that all the things happened correctly. That means you don't need as much history in order to like feel as though all the transactions happen appropriately, and so like I am interested in that narrative change of I need to know what's going on to I need to make sure I can trust what happened is good this is sort of like a racing I mean, you're a racing history at some points if you if you ever wanted to, like I don't know, let's just say, roll back to the stage or something like that, like you'll never have you only have. Yeah, that's true, I mean, but you are you you you're essentially saying that, uh, you assumptionized that you're not gonna be rolling back to that state, right, So that I mean at the point that you generated checkpoint, you know for certain n N nine because this is probabilistic percent that you know that checkpoint is bad and you don't have to roll back past them, so that last checkpoint is always going to be the unless, like you said that, the application type maybe there's a sort of I mean I guess probably for ninety whatever making most of the application types aren't gona require that storical data, right, are...

...there? Some types of absolutely absolutely, even even roll ups required it, right, So roll ups rely on the chain to provide data availability, right. So um, Essentially, if you shorten the amount of history that you have un chained, then you're also intrinsically making this roll ups um potentially more restrictive, especially in the case of Optimistic crops because it does have a um an exit period, right, so you do give some time to auto validators in the network to challenge or yourself to challenge the chain the roll up and say hey, you know, maybe this transaction wasn't actually correct, And so you do rely on a time period period where you can provide us this, uh front Brooks. Does that mean like if let's say I wanted to create an L two, that that L two the total duration of that L two is limited of data the store? Yes, so yeah, so it thortens your period for challenges. Right. So like for example, if Optimistic is reliant on the seventh day period four challenges, now, uh, and I think we can only keep it for three days? Then you know you're limited to only three days, it doesn't matter what, So there's a limitation. You might have some some roll up approaches that won't work in the future, right with this one one week uh period. Let's talk about data, like just data, the ability and and what happens to data, what happens to roll update, what happens to on chain data. Yeah, so we basically like we were taking a gearing that we were gearing up to that we have we've painted that picture of alright, so the underlying L one is basically providing a level of validation that something happened correctly, and then all of the actual action is happening in these layer two's What do they do with that data? And what do you be like, what does it mean to have like data durability? Yeah, so that's actually a very good question what happens to roll up data? Right? So it isn't yet clear list to me, how are rolloups are going to deal with this, with the problem of data um and really roll ups are inheriting all the same problems that the ones had, right, Really, all all ones are doing here is saying, hey, we're not gonna support all these use cases. We're going to focus on supporting L twos and we're gonna rely on on one. Sorry. L two is providing all the the application layer kind of functionality. Or you probably don't want to run your application, your dapth on top of a net one anymore. You probably want to run But that poses the question of what happens We haven't really solved this problem of where the day ago? Uh, and and that is that is really a good question. What happens to the data and own roll ups. Um, I haven't heard a good answer yet, although I have. I've asked around and the Again I've said this before, but the assumption is that you don't You can centralize an L two essentially, ah, as long as the L one is the centralized. And I don't think that that. Actually that doesn't really sound correct to me. I mean, it might actually work because it depends on the function of the L two, right, But if you want the L two to mimic what it means to be ethereum, yeah, then it feels vessels wrong exactly. So, so I guess that the right Talking about narratives, that's the narrative that's been pushed now, and the narrative is essentially, as long as they want is the centralized. You can build an L two that's centralized as you want, it's still going to be the centralized, the centralized dat it gets committed to a one, right or centralized state's comed one and sort of have there of course state...

...and the decentralized L one. Yeah, I mean you're still pushing data on on on chain, and you're still generating quite a lot of data. But you can do this in a you can do this in a way that it is compressed and it doesn't necessarily grow as fast as as it would grow with all the use cases that the theium for example was built enabled. Right. So um, but yeah, the the data that the roll ups are generating is actually again the excuses or the narrative is that you can centralize on it, which is because you have confidence that what gets pushed to the L one is valid exactly exactly right. So what kind of like a historical guarantees do you get with and L two if that that validity check isn't sticking around on the L one for very long? I guess it depends on on again, it depends on which which sort of which kind of felter we're talking about. If it's uh z k al too, then the data that you commit on the chain or the proof that you come in on chain when you commit it, it's in a valid or not. There's really no two ways about it. With an optimistic roll up, you do have a challenge spirit, and you know past that challenge spirit. Once that or once the data is up from the L one, you can't really do much about it. It's interesting because if you've gone from starting where everyone has all the data two uh, no one has all the data to I don't even know where the data is anymore. And it's it's interesting to think about how the stuff comes forward as the general term blockchain keeps being applied to all of it. Right, Yeah, it's interesting. I think I think this data problem is incredibly pernicious, and I think it's it's uh, we found ways of convincing ourselves that it's not a problem anymore, but it is. I mean, to be fair, not all things need to be tracked by all people. And the argument there most of the time is if the people care about it, they'll hold onto it, assuming that they have a certain level of cryptographic guarantee that it is good once they archive it away or like they keep at some place. Right, so like you only need that specific time period of like ensuring that it's valid when it happens. Then once it's valid after it happens, you can archive it somewhere and presumably cryptographically sign it such a way where they can't be changed. Do you have some level of storage and presumably people care about that, they'll take care of it. Because I can't. It should go away, and I think that's a reasonable thing. It is a reasonable assumption to some degree. Um. But again, to me, the question is where do you want to put this data? Right? So if you want to again, you might not need this data to come to consensus or too, but you might need this data to to run your shut or the centralized application. Let's let's think about it from finance application, right, So, like, I want to do my taxes and for some reason, I have compliance issues that required me to understand the accounting over the past six years. And that's way beyond the idea of like what ethereum L one will be keeping track of. So I need to be able to keep track of all of the data that I've all the financial transactions I've made, say on an L two over the past six years. How are we going to do that now? Right? So, Like, that's where we didn't based on giving up the solutions of like where this stuff go and who cares about it? And how is that like paid for for storage? Right? So, traditionally in centralized applications that data would be kept by the institution. Right. In this case, it's supposed to be kept by the user to some degree or the provider of the service and shape right, Um, yeah, and those are two different things, but those those would be the assumptions. So that has implications on like resource requirements of those people who are participating. So like,...

...if I'm an individual and I need to keep track of my finances for the past six years, I need to ensure that I have a computer that can handle it and it's keeping up with these things over the course of that six years, because it's now my responsibility to do it. Yeah, and I think that's a kind of a drastic shift and expectations of people who need to do these things. Yeah, absolutely, it's Uh, it shifted quite a lot from from older the data some chain and older history is available. Two, we don't care about the data and we don't care about the history. Also, like and you can there's a network out there that I can query as opposed to it no longer is and if I didn't get it and I no longer can, or like I need to pay someone to do it exactly right now, Like you made a this distinction when you're talking about, um, the concept of of data networks, when you're giving an overview of what codex does to me and that is extrinsic and intrinsic data, where extrinsic being like a peer to peer network like file sharing, where the data that's being shared has nothing to do with the function of the network, and intrinsic being the data that's being shared is fundamental to the function of the network. And so we've always had in the past, most people have this intuition that the data is required in blockchain networks, and so I can always just ask the network and will give it to me. But the mentality of that shift is that's not longer the case, and I need to be start thinking about taking care of myself exactly. So we've shifted the narrative and we've made compromises in in some some functionality. But yeah, that narrative shift, I think is is really interesting and that um once again we're offloading responsibility to the user by giving them selves sovereignty and also forcing them to have appropriate like computational resources to handle that and understand how to handle it. Yeah, yeah, I mean we we are shifting I would say we're shifting it off the core PRODUCTOL onto someone else. That someone else could be a user or it could be an institution of would be uh you know, ah, a web free company or something like that. Uh. But yeah, we were essentially that we not dealing with this anymore, right as a as a necessity is scale? Yeah exactly. I still want to talk about like data and the ability and why we need it, and like specifically around the US cases where okay, yeah, your data is now not on chain. It has to go somewhere exither you keeping it or a service that you're using and keeping it. But then the question is, uh, if it's a service, what is the is it? Is it keeping it on a centralized provider? Right? Because if that's the case, then we don't have the secuted applications, we don't have centralizedation. Um. And we've kind of you know, shifted back into the wold WE two world. What what is that to do? Yeah? Yeah, and then and then like introduced data dourability, but like for specifically for what it's for and um, why do you need it? Why do you want it? Uh? And how do you get it? Fell? A lot of conversations you're differ on what people saying building a decentralized durable data Yeah, and everyone's like terrible, Yeah exactly. Yeah, their ability is a two weird word. I mean my my my usual answer is like, okay, you know i'd be a best yea a file you have no guarantees it's gonna be their car, yeah, like yeah, yeah is not popular, everyone's cashing it, not going And so if you want your file to be guaranteed to be there, you need their ability guarantees. Yeah. Absolutely, Yeah, that that's a very good description, recified way, Yeah, that's a very good description. And that's that's actually that's actually exactly what it is. Well, I mean, your ability is not necessarily that that, but that's what we're uh like, the effect that you get is what Eric just describing. So you want to be able to put your stuff on some storage...

...and then have some guarantees that you will be able to get it later, and that later has some reasonable bounce, but they should be predictable. You should be able to say, well, in a year's time, find data is still gonna be there because I paid for it or something. Right, I get some guarantees, some level of guarantee is the main thing. Whereas whereas right now, again as we described to put your stuff on on bittor and right if as you have no guarantees that your files are going to be there, and you know they usually it's mostly tied to popularity uh in in that's the that determines the amount of time that the file is going to stay in the that So I think one one interesting thing that I found the most interesting about codex, or at least just learning about decentralized storage protocols, is that we can actually get the same level of uh number of the same level of guarantees that centralized storage fighters provide and also be done on decentralized systems. There's obviously some trade offs, but I found it interesting. I mean, no, maybe we elaborate on that a little bit, like, yeah, of course trade offs, how many times we can get I'm not sure if there's actually any trade offs, to be honest with you, I think if you build this right, you could you could get away with it without dating trade ups, and you could actually just fundamentally improve on what we have right now in terms of how we store data. Right. I think that's that's the that's the takeaway right now, because at the end of the day, like you think about what a centralized solution is, it's also a distributed service. Yeah, it's just one company handling it. Hopefully it is. Yeah, like they have like if you look at HBC solutions to the Luster file system, you look at anything that you have to have a shipload of data stored that's spread across a bunch of different servers, and there's some type of like iggregation mechanism for retrieving it, or like redundancy issue of like when when hard drives fails in the day, like if you can done a bare metal hard drives fail, So you have to have some level of guarantees that it's distributed in such a way where when one drive fails, you don't lose it. That's why we have all kinds of raids and on raids and xfs and tb R TV has all these things right. And so when you think about the fact that when you get to a certain level of data, storage is already distributed. So then how do we change that model where it's one person controlling it. It's the community who wants the data to control it, and they all get paid commencer to their contribution. It's not that far fetched to believe that you can make a similar or better experience. I think that's a fundamental point where I mean, yes, we do have distributed data solutions, uh, and most data centers hopefully are distributed. Um, but making taking death to sub further making it decentralized is what has been incredibly hard, and that's what we've been trying to do for the past I don't know, you know, ever since we've we've got network, network can and file sharing, we've been kind of chasing this thing where we want to get the ability, but we're not necessarily very good at articulating data either. I haven't seen that being articulated very well in and in literature and just in the communities. So I mean, like, ah, don't just personal care to my mom started talking about, Uh. That is that if in the decentralized world, a drive or no goes down, you need to prepare. Ye got down with the whole data set, and that's taxing on a network when you've got terri by data sets, so you can distribute it in a way that requires us to not have to download the entire data sets. So it's you can use things like rent element ratio coding where we only need to get cave logs downloaded from the data set to prepare particular slots or whatever that we're hosting. Yeah, I mean you'd still have to get the exact same amount of data that the tail that failed if your original data, so you would be able replication...

...side. So like like if you're if you're old school way of replicating data across the network, it kind of probably works well for centralized system. Yeah. Yeah, that's a good point. I mean, we most data centers are moving towards using rastor coding because it is u plain and simple better, and the computational creatives that existed uh in the past are now being slowly kind of uh pushed out with new codes, with new solutions to old codes, et cetera. So that's an interesting thing to to to talk about. But it's the network load. I think that's interesting for me because in the centralized solution, yeah, you just download the data set, it's fine. But yeah, with the decentralized solution, we've got other things going on in the network that um you can bottle make it like talking to the l h D for example, trying to find all the c ideas maybe have a terrified data set or something like that. You start to be you have all the same the same problems in a in a in a in a centralized um um um data center data center. Yeah. So, I mean, because at the end of the day, they're distributed. The only thing that makes them different from I mean, there's many things that makes make them different, but in this context, what makes them different is from a centralized solution versus centralized solution is that you're delegating trust to this one entity that controls all the hardware. It doesn't have any it doesn't have any business in behavior and it's in the network. It might have some, but you know, it's it's a trust that network. Essentially the business behavior fault, not throw you have business exactly, truthfolds not not not true malicious actors. Yep. So that's that's the difference, right, So once you start taking it. So we we've had to technology to kind of build this right, even though most people still use replications replication over over racial coding because of the computational credos uh days. As they said, there's shifting in the centralized world as well. So um yeah, bringing making a destruted network centralized is where things start becoming hard very quickly. That's a that's kind of fam there. Thanks coming on talking about data. That's it. Yeah, I haven't I haven't even started talking about the I think I think it. I think it makes sense to uh, for the purpose of Codex related stuff, we can have a specifics like separate call on like what does it mean for like distributed data play because like all this this, this purpose of this episode is data as a as a function and blockchain networks. Right, we kind of moved into an interesting thing which I which I didn't expect, was that, like because the narrative or like the actual point of data on blockchain networks is validation. And yeah, and through necessity of scale, we've offload a lot of that stuff too alternative networks. Where that's where codex becomes interesting is what are they gonna do? Whereas like the point of like like we get to the point of what does the blockchain network data for? And that's valination. That's it, like the whole point of the blockchain network and the data associated with it. It's validation of the of the actions within that network. And then once we have that, that's where Codex comes in. It's like, well, I need to retrieve that ship if you wanted an historical context, right, and that's become that that that becomes like the middle man of like the blockchain network is no longer part of that now, it's it's you need some other distributed service, hopefully if you want distribute applications to help you retrieve that information for any historical part that's like past the lifetime of what the consensus network actually provides exactly. I mean, I think that's like the name a thing that we got to keep an...

...eye out from Part two of the data layer Quality, Quality, user experience.

In-Stream Audio Search

NEW

Search across all episodes within this podcast

Episodes (126)