Neuralink looks to the public to solve a seemingly impossible problem

TheDudeV2@lemmy.ca · 6 months ago

Neuralink looks to the public to solve a seemingly impossible problem

dullbananas (Joseph Silva)@lemmy.ca · 6 months ago

Submit your algorithms under GPL

potatopotato@sh.itjust.works · 6 months ago

AGPL just in case they try to put your brain waves into the cloud

orclev@lemmy.world · 6 months ago

GPLv3, make it really radioactive to them.

dariusj18@lemmy.world · 6 months ago

Did they try Stack overflow?

bus_factor@lemmy.world · 6 months ago

Why not skip the middle man and ask ChatGPT directly?

Billiam@lemmy.world · 6 months ago

*GrokAI

You know, Xitter’s shittier AI.

bus_factor@lemmy.world · 6 months ago

Fair, I was thinking in the context of Stack Overflow.

RobotZap10000@feddit.nl · 6 months ago

Why would you ever want to do that?! Marked as duplicated. Shove a cactus up your ass.

TheDudeV2@lemmy.ca · edit-2 6 months ago

I’m not an Information Theory guy, but I am aware that, regardless of how clever one might hope to be, there is a theoretical limit on how compressed any given set of information could possibly be; and this is particularly true for the lossless compression demanded by this challenge.

Quote from the article:

The skepticism is well-founded, said Karl Martin, chief technology officer of data science company Integrate.ai. Martin’s PhD thesis at the University of Toronto focused on data compression and security.

Neuralink’s brainwave signals are compressible at ratios of around 2 to 1 and up to 7 to 1, he said in an email. But 200 to 1 “is far beyond what we expect to be the fundamental limit of possibility.”

orclev@lemmy.world · 6 months ago

The implication of a 200 to 1 algorithm would be that the data they’re collecting is almost entirely noise. Specifically that 99.5% of all the data is noise. In theory if they had sufficient processing in the implant they could filter the data down before transmission thus reducing the bandwidth usage by 99.5%. It seems like it would be fairly trivial to prove that any such 200 to 1 compression algorithm would be indistinguishable in function from a noise filter on the raw data.

It’s not quite the same situation, but this should show some of the issues with this: https://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/

Death_Equity@lemmy.world · 6 months ago

Absolutely, they need a better filter and on-board processing. It is like they are just gathering and transmitting for external processing instead of cherry picking the data matching an action that is previously trained and sending it as an output.

I’m guessing they kept the processing power low because of heat or power availability, they wanted to have that quiet “sleek” puck instead of a brick with a fanned heatsink. Maybe they should consider a jaunty hat to hide the hardware.

Gathering all the data available has future utility, but their data transmission bottleneck makes that capability to gather data worthless. They are trying to leap way too far ahead with too high of a vanity prioritization and getting bit for it, about par for the course with an Elon project.

Stardust@kbin.social · 6 months ago

There is a way they could make the majority of it noise - if they reduced their expectations to only picking up a single type of signal, like thinking of pressing a red button, and tossing anything that doesn’t roughly match that signal. But then they wouldn’t have their super fancy futuristic human-robot mind meld dream, or dream of introducing a dystopian nightmare where the government can read your thoughts…

Paragone@lemmy.world · 6 months ago

The problem isn’t “making the majority of it noise”,

the problem is tossing-out the actual-noise, & compressing only the signal.

Without knowing what the actual-signal is, & just trying to send all-the-noise-and-signal, they’re creating their problem, requiring 200x compression, through wrongly-framing the question.

What they need to actually do, is to get a chip in before transmitting, which does the simplification/filtering.

That is the right problem.

That requires some immense understanding of the signal+noise that they’re trying to work on, though, and it may require much more processing-power than they’re committed to permitting on that side of the link.

shrug

Universe can’t care about one’s feelings: making-believing that reality is other than it actually-is may, with politial-stampeding, dent reality some, temporarily, but correction is implacable.

In this case, there’s nothing they can do to escape the facts.

EITHER they eradicate enough of the noise before transmission,

XOR they transmit the noise, & hit an impossible compression problem.

Tough cookies.

_ /\ _

QuadratureSurfer@lemmy.world · edit-2 6 months ago

NAND - one of the 2 you listed, or they give up.

Cocodapuf@lemmy.world · 6 months ago

I’m not sure that’s accurate.

Take video for example. Using different algorithms you can get a video down half the file size of the original. But with another algorithm you can get it down to 1/4 another can get it down to 1/10. If appropriate quality settings are used, the highly compressed video can look just as good as the original. The algorithm isn’t getting rid of noise, it’s finding better ways to express the data. Generally the fancier the algorithm, the more tricks it’s using, the smaller you can get the data, but it’s also usually harder to unpack.

orclev@lemmy.world · 6 months ago

It’s important to distinguish between lossy and lossless algorithms. What was specifically requested in this case is a lossless algorithm which means that you must be able to perfectly reassemble the original input given only the compressed output. It must be an exact match, not a close match, but absolutely identical.

Lossless algorithms rely generally on two tricks. The first is removing common data. If for instance some format always includes some set of bytes in the same location you can remove them from the compressed data and rely on the decompression algorithm to know it needs to reinsert them. From a signal theory perspective those bytes represent noise as they don’t convey meaningful data (they’re not signal in other words).

The second trick is substituting shorter sequences for common longer ones. For instance if you can identify many long sequences of data that occur in multiple places you can create a lookup index and replace each of those long sequences with the shorter index key. The catch is that you obviously can’t do this with every possible sequence of bytes unless the data is highly regular and you can use a standardized index that doesn’t need to be included in the compressed data. Depending on how poorly you do in selecting the sequences to add to your index, or how unpredictable the data to be compressed is you can even end up taking up more space than the original once you account for the extra storage of the index.

From a theory perspective everything is classified as either signal or noise. Signal has meaning and is highly resistant to compression. Noise does not convey meaning and is typically easy to compress (because you can often just throw it away, either because you can recreate it from nothing as in the case of boilerplate byte sequences, or because it’s redundant data that can be reconstructed from compressed signal).

Take for instance a worst case scenario for compression, a long sequence of random uniformly distributed bytes (perhaps as a one time pad). There’s no boilerplate to remove, and no redundant data to remove, there is in effect no noise in the data only signal. Your only options for compression would be to construct a lookup index, but if the data is highly uniform it’s likely there are no long sequences of repeated bytes. It’s highly likely that you can create no index that would save any significant amount of space. This is in effect nearly impossible to compress.

Modern compression relies on the fact that most data formats are in fact highly predictable with lots of trimmable noise by way of redundant boilerplate, and common often repeated sequences, or in the case of lossy encodings even signal that can be discarded in favor of approximations that are largely indistinguishable from the original.

Miaou@jlai.lu · 6 months ago

Ugh? That’s not what it means at all. Compression saves on redundant data, but it doesn’t mean that data is noise. Or are you using some definition of noise I’m not aware of?

TheDudeV2@lemmy.ca · 6 months ago

I can try to explain, but there are people who know much more about this stuff than I do, so hopefully someone more knowledgeable steps in to check my work.

What does ‘random’ or ‘noise’ mean? In this context, random means that any given bit of information is equally as likely to be a 1 or a 0. Noise means a collection of information that is either random or unimportant/non-useful.

So, you say “Compression saves on redundant data”. Well, if we think that through, and consider the definitions I’ve given above, we will reason that ‘random noise’ either doesn’t have redundant information (due to the randomness), or that much of the information is not useful (due to its characteristic as noise).

I think that’s what the person is describing. Does that help?

Miaou@jlai.lu · 6 months ago

I agree with your point, but you’re arguing that noise can be redundant data. I am arguing that redundant data is not necessarily noise.

In other words, a signal can never be filtered losslessly. You can slap a low pass filter in front of the signal and call it a day, but there’s loss, and if lossless is a hard requirement then there’s absolutely nothing you can do but work on compressing redundant data through e.g. patterns, interpolation, what have you (I don’t know much about compression algos).

A perfectly noise free signal is arguably easier to compress actually as the signal is more predictable.

Waldowal@lemmy.world · 6 months ago

I’m no expert in this subject either, but a theoretical limit could be beyond 200x - depending on the data.

For example, a basic compression approach is to use a lookup table that allows you to map large values to smaller lookup ids. So, if the possible data only contains 2 values: One consisting of 10,000 letter 'a’s. The other is 10,000 letter 'b’s. We can map the first to number 1 and the second to number 2. With this lookup in place, a compressed value of “12211” would uncompress to 50,000 characters. A 10,000x compression ratio. Extrapolate that example out and there is no theoretical maximum to the compression ratio.

But that’s when the data set is known and small. As the complexity grows, it does seem logical that a maximum limit would be introduced.

So, it might be possible to achieve 200x compression, but only if the complexity of the data set is below some threshold I’m not smart enough to calculate.

QuadratureSurfer@lemmy.world · edit-2 6 months ago

You also have to keep in mind that, the more you compress something, the more processing power you’re going to need.

Whatever compression algorithm that is proposed will also need to be able to handle the data in real-time and at low-power.

But you are correct that compression beyond 200x is absolutely achievable.

A more visual example of compression could be something like one of the Stable Diffusion AI/ML models. The model may only be a few Gigabytes, but you could generate an insane amount of images that go well beyond that initial model size. And as long as someone else is using the same model/input/seed they can also generate the exact same image as someone else. So instead of having to transmit the entire 4k image itself, you just have to tell them the prompt, along with a few variables (the seed, the CFG Scale, the # of steps, etc) and they can generate the entire 4k image on their own machine that looks exactly the same as the one you generated on your machine.

So basically, for only ~~a few bits~~ about a kilobyte, you can get 20+MB worth of data transmitted in this way. The drawback is that you need a powerful computer and a lot of energy to regenerate those images, which brings us back to the problem of making this data conveyed in real-time while using low-power.

Edit:

Tap for some quick napkin math

For transmitting the information to generate that image, you would need about 1KB to allow for 1k characters in the prompt (if you really even need that),
then about 2 bytes for the height,
2 for the width,
8 bytes for the seed,
less than a byte for the CFG and the Steps (but we’ll just round up to 2 bytes).
Then, you would want something better than just a parity bit for ensuring the message is transmitted correctly, so let’s throw on a 32 or 64 byte hash at the end…
That still only puts us a little over 1KB (1078Bytes)… So for generating a 4k image (.PNG file) we get ~24MB worth of lossless decompression.
That’s 24,000,000 Bytes which gives us roughly a compression of about 20,000x
But of course, that’s still going to take time to decompress as well as a decent spike in power consumption for about 30-60+ seconds (depending on hardware) which is far from anything “real-time”.
Of course you could also be generating 8k images instead of 4k images… I’m not really stressing this idea to it’s full potential by any means.

So in the end you get compression at a factor of more than 20,000x for using a method like this, but it won’t be for low power or anywhere near “real-time”.

Cosmicomical@lemmy.world · 6 months ago

just have to tell them the prompt, along with a few variables

Before you can do that, you have to spend hours of computation to figure out a prompt and a set of variables that perfectly match the picture you want to transmit.

QuadratureSurfer@lemmy.world · 6 months ago

Sure, but this is just a more visual example of how compression using an ML model can work.

The time you spend reworking the prompt, or tweaking the steps/cfg/etc. is outside of the scope of this example.

And if we’re really talking about creating a good pic it helps to use tools like control net/inpainting/etc… which could still be communicated to the receiving machine, but then you’re starting to lose out on some of the compression by a factor of about 1KB for every additional additional time you need to run the model to get the correct picture.

Cosmicomical@lemmy.world · 6 months ago

You are removing the most computationally intensive part of the process in your example, that’s making it sound easy, while adding it back shows that your process is not practical.

QuadratureSurfer@lemmy.world · 6 months ago

The first thing I said was, “the more you compress something, the more processing power you’re going to need [to decompress it]”

I’m not removing the most computationally expensive part by any means and you are misunderstanding the process if you think that.

That’s why I specified:

The drawback is that you need a powerful computer and a lot of energy to regenerate those images, which brings us back to the problem of making this data conveyed in real-time while using low-power.

And again

But of course, that’s still going to take time to decompress as well as a decent spike in power consumption for about 30-60+ seconds (depending on hardware)

Those 30-60+ second estimates are based on someone using an RTX 4090, the top end Consumer grade GPU of today. They could speed up the process by having multiple GPUs or even enterprise grade equipment, but that’s why I mentioned that this depends on hardware.

So, yes, this very specific example is not practical for Neuralink (I even said as much in my original example), but this example still works very well for explaining a method that can allow you a compression rate of over 20,000x.

Yes you need power, energy, and time to generate the original image, and yes you need power, energy, and time to regenerate it on a different computer. But to transmit the information needed to regenerate that image you only need to convey a tiny message.

Cocodapuf@lemmy.world · 6 months ago

Neurons work in analogue data, I’m not sure lossless algorithms are necessary.

SharkAttak@kbin.social · 6 months ago

Why should we? What’s in it for us?

QuadratureSurfer@lemmy.world · 6 months ago

A job interview! (I wish I was joking).

The reward for developing this miraculous leap forward in technology? A job interview, according to Neuralink employee Bliss Chapman. There is no mention of monetary compensation on the web page.

Gsus4@mander.xyz · edit-2 6 months ago

Nothing, but then you could patent it and license it to anyone but elon :) are you motivated yet?

AngryCommieKender@lemmy.world · 6 months ago

You can have a free “flamethrower” cigarette lighter. The company is bankrupt, and musk has a warehouse if the things he didn’t sell.

BobGnarley@lemm.ee · 6 months ago

I mean damn bro helping humans potentially walk again is a pretty big “for us” thing if you think about it in terms of humankind and not just yourself. Like imagine if someone were trying to cure cancer with the help of the public and you’re all like “well what the fuck is in it for ME though?”

cestvrai@lemm.ee · 6 months ago

Imagine we all pooled our resources to fund medical research through taxes only for private companies to exploit the technology and jack up the prices…

A brain implant for rich people isn’t necessarily “for us”.

SharkAttak@kbin.social · 6 months ago

Oh but I’m not saying this out of selfishness, the problem for me is not the cancer cure in itself, but who is doing the research…

the experiments on monkeys were questionable in method and nature, and led to death and madness;
the other chip installed in a human has already lost the majority of connection wires;
and not to forget, it’s not been specified how the public giving the ideas, would benefit from it. Musk is not exactly known as the phylanthropic kind.

drdiddlybadger@pawb.social · 6 months ago

That isn’t at all their problem their problem is scar tissue buildup that they haven’t even bothered addressing. Wtf are they doing talking about data compression when they can’t even maintain connection.

Cocodapuf@lemmy.world · 6 months ago

You really think they only have one problem to solve? If that were the case this would be relatively easy.

BarbecueCowboy@lemmy.world · edit-2 6 months ago

There were rumors of that and a lot of other complications in the animal trials. I don’t think we ever got proof, but a lot of irregularities that were explained away. Could be a lot more problems coming.

Modern_medicine_isnt@lemmy.world · 6 months ago

Cause there are always more patients… but more data will let them get more press when it enables more interesting demos.

AA5B@lemmy.world · 6 months ago

Already solved by evolution. This is the same problem as all of us have with visual data. We’ve evolved to need much less data transfer by doing some image processing first. Same deal. Stick some processors in there so you only need to transfer processed results, not raw data

Evotech@lemmy.world · 6 months ago

Did they try middle out compression?

Nomecks@lemmy.ca · 6 months ago

Listen Elon, I have three words that will blow your mind: Middle out compression!

AbidanYre@lemmy.world · 6 months ago

That’s a lot more civil than the three words I have for him.

Luvs2Spuj@lemmy.world · 6 months ago

He’s such a genius, why would he look for additional help? All these claims are such shit. Remember when Tesla would be fully self driving and we would all whizzing around in tunnels? Fuck this guy.

BobGnarley@lemm.ee · edit-2 6 months ago

Tesla is a load of shit for sure but SpaceX and this Neuralink of it really does what its supposed to, actually contribute to humanity. Especially this.

Voroxpete@sh.itjust.works · 6 months ago

Brain machine interface development has been around for a lot longer than nueralink. Musk is just better at getting his stuff into the headlines. Yes, the idea is good and beneficial to humanity, but then so are electric cars. That’s part of Musk’s grift. He latches onto something genuinely good and turns it into his pet project so that any criticism of how he does it can easily be deflected, because he’s automatically the good guy just for being there at all.

Hugh_Jeggs@lemm.ee · 6 months ago

I’ve got some of those bags you put your clothes in then seal with a vacuum cleaner, if that’s any use

palordrolap@kbin.social · 6 months ago

Surprised they haven’t tried to train a neural network to find a compression algorithm specifically for their sort of data.

There’s a ridiculous irony in the fact they haven’t, and it’s still ironic even if they have and have thrown the idea out as a failure. Or a dystopian nightmare.

But if it is the latter, they might help save time and effort by telling “the public” what avenues have already failed, or that they don’t want purely AI-generated solutions. Someone’s bound to try it otherwise.

orclev@lemmy.world · 6 months ago

They did, but then Elon insisted they add a virtual neuralink into it and now the neural network is braindead.

nifty@lemmy.world · 6 months ago

This seems more like a hardware issue than a compression algorithm issue, but I could be wrong

kibiz0r@midwest.social · 6 months ago

How do you send 200x as much data?

You don’t. The external system needs to run an approximation of the internal system, which the internal system will also run and only transmit differences.

There you go. Solved it. (By delegating to a new problem.)

SuperFola@programming.dev · 6 months ago

Just add 199 more transmistters

MonkderDritte@feddit.de · edit-2 6 months ago

They want to add compression to the implant?

And how does the brainwave data look? I’m sure they have some samples?

partial_accumen@lemmy.world · 6 months ago

They want to add compression to the implant?

They’re making their own silicon for their sensor so adding an on-die ASIC for a specific compression method sounds pretty attainable.

Cosmicomical@lemmy.world · edit-2 6 months ago

What does this have to do with the question? Having samples of the data they want to compress is fundamental if you hope to find an algorythm to compress 200x.

partial_accumen@lemmy.world · 6 months ago

What does this have to do with the question? Having samples of the data they want to compress is fundamental if you hope to find an algorythm to compress 200x.

There were two questions asked. I answered for part of the first question. I have no information on the second question (samples). You’re welcome to do your own googling to see if you can find an answer.