Urban75 Home About Offline BrixtonBuzz Contact

"Government will spy on every call and e-mail"

If 60 million people all produced 90 datasets the size of 'Garfs post' (2kb) everyday (which I consider a ludicrous exaggeration), you'd need less than 12 times that storage capacity (allowing for RAID config - 33%) per year - around 3.5 Pb.

Going by the figure in the article for SMS messages (57Bn p.a.), and assuming each of those messages was 120 bytes long, you need just over 6 Tb (6370 Gb) to store the entire UKs output for a year.

If the 120 Tb array described in Bernie's 4 year old link is 'not much at all', how can anyone logically argue that it's somehow not technically possible? :confused:

Can you even still buy 72 Gb drives?
 
Going by the figure in the article for SMS messages (57Bn p.a.), and assuming each of those messages was 120 bytes long, you need just over 6 Tb (6370 Gb) to store the entire UKs output for a year.
People communicate a whole load more electronically now than 4 years ago and create far bigger electronic footprints. For example. 6.5 BILLION text messages were sent during May this year. In just the UK. And 46.52 million picture messages were sent in the same month.

And on New Year's Day 2007, a staggering 214 million text messages were sent. On average, 4.7 million messages are currently sent every hour in Britain.

And it's not just a case of dumbly collecting the data and shunting it all into bottomless hard disks. It has to be interrogated, filed, indexed, tagged, cross-referenced, checked, searched and backed up. Every single second of the day.

Think that's an easy job?


Source: http://www.text.it/mediacentre/
 
People communicate a whole load more electronically now than 4 years ago and create far bigger electronic footprints. For example. 6.5 in May text messages were sent during May this year. In just the UK. 46.52 million picture messages were sent in the same month.

And on New Year's Day 2007, a staggering 214 million messages were sent.

And it's not just a case of dumbly collecting the data and shunting it all into bottomless hard disks. It has to be interrogated, filed, indexed, cross-referenced, checked, searched and backed up. Every single second of the day.

Think that's an easy job?
I'm not sure what you're arguing, here. :confused:

Are you saying it's technically not possible, or disputing the maths, or what?

Or are you suggesting that 'dumbly collecting the data and shunting it all into bottomless hard disks' is what the system described in the link above does?

214,000,000 text messages is what, 24Gb?
 
Forgive re-quote, your post keeps changing.

People communicate a whole load more electronically now than 4 years ago and create far bigger electronic footprints. For example. 6.5 BILLION text messages were sent during May this year. In just the UK. And 46.52 million picture messages were sent in the same month.

And on New Year's Day 2007, a staggering 214 million text messages were sent. On average, 4.7 million messages are currently sent every hour in Britain.

And it's not just a case of dumbly collecting the data and shunting it all into bottomless hard disks. It has to be interrogated, filed, indexed, tagged, cross-referenced, checked, searched and backed up. Every single second of the day.

Think that's an easy job?


Source: http://www.text.it/mediacentre/

4.7 million per hour - that's, what, 5Gb?

If I was spending £12Bn on a system I'd probably want it to handle a bit more than 5Gb per hour of input.

6.5 Bn per month x 12 is still only 8.5Tb per year.
 
Sorry, i not been really following the maths - surely though they'd just search for the keywords, and concentrate on the people throwing up keywords and their network of contacts? Wouldn't need huge amount of storage to do that i wouldn't have thought.

It wouldn't surprise me if they did it - mind you it still amazes me that google can do such thorough searches of such huge amounts of information in such short time.
 
1) Contact-network analysis

The Times story describes a twist on the existing arrangements under RIPA to require ISPs and phone companies to keep records of communications sessions, not content.

October 5, 2008
Government will spy on every call and e-mail
David Leppard

Ministers are considering spending up to £12 billion on a database to monitor and store the internet browsing habits, e-mail and telephone records of everyone in Britain.

The twist is the idea of a central repository. E2A: I'm sure that's been reported before, almost certainly in the Register.

The only new news I see in there is the price tag. And it looks like a rip-off compared to the size of the storage task. But maybe it's for queries, not storage.

Anyone here got any expertise in analysing undirected graphs, small world theory and stuff like that?

What I can immediately see them wanting to do with this data is to pick up editor, ask "who are his mates?" and get an answer sharpish.

Just getting a list would merely involve indexing the data - it'd be nice to hold the index in RAM.

But the more subtle and interesting questions is: "how are his mates grouped?" - "Does look like a cell structure?"

I don't know what the computing requirements for doing that rapidly are.
 
2) Content searching

On the other hand, techniques for searching truly huge quantities of data do exist.

They can do the search itself in hardware.

It works like this:

They tokenise the data - store tokens representing phonemes and words, not audio or ASCII. Right there, you have massive compression - look up "Huffman coding".

They stream the tokens through dedicated pattern-matching hardware. I'm too tired to paint a clear picture of how fast this is. Let's say the practical limit it always how fast you can throw data at it.
 
The benifit to this sort of database is connections, which is harder to work with than merely tokenising stuff, which still needs a hell of a lot of support hardware and software.

Suspect A has a disposable email account, you have activity from that account at X from location Y. Check all other activity in the local area at the same time, cross reference with other time periods when activity is working, cross reference with other suspect / known accounts, cross reference IP addresses with non UK locations, cross reference with encrypted content. Course you could just use it as a giant google program, but that'd be stupid. ;)
 
Forgive re-quote, your post keeps changing.

4.7 million per hour - that's, what, 5Gb?.
Err, that's just the SMS messages.

Now add voice landline calls, emails, mobile calls, web traffic and other data and it's going to be a whole load more than 5GB.

Assuming it all works and can't be circumvented of course.
 
What I can immediately see them wanting to do with this data is to pick up editor, ask "who are his mates?" and get an answer sharpish.
Not so sharpish if his mates have been communicating through encrypted emails sent from different anonymous proxies via web cafes and chatting via disposable phone accounts.
 
Not so sharpish if his mates have been communicating through encrypted emails sent from different anonymous proxies via web cafes and chatting via disposable phone accounts.

Come on, those are all red flags that this system would help use to track people down. Encrypted emails from cybercafes? Instantly says that there's something interesting going on. Encrypted email from cafe from otherwise unused account to another unused account, hell it doesn't matter what's in the message you KNOW there's something serious going on.
 
Not so sharpish if his mates have been communicating through encrypted emails sent from different anonymous proxies via web cafes and chatting via disposable phone accounts.

yep they'd be wanting to connect it up to CCTV cameras with image recognition in that case - sending through encrypted e-mails would be suspicious though, if you've not got anything to hide why are you encrypting them?
 
Not so sharpish if his mates have been communicating through encrypted emails sent from different anonymous proxies via web cafes and chatting via disposable phone accounts.

If this 'editor' made a point about posters using anon proxies, and started being a cunt about it, he could be storing up a lot of problems for himself and posters.
 
That's how I read it, but then those powers are already partly in place under the RIP Act.

Pretty much, together with the 2006 EU Data Retention Directive. Session data - who are you calling/emailing when, which web sites you visit, etc.

As laptop observed, that's enough to get a good start for finding out what someone's up to and with whom. Encrypting the content of your emails doesn't make any difference to this kind of analysis, except perhaps to highlight the fact that you have something to hide.

The Data Retention Directive (as the name suggests) allows retrospective combing of these patterns. From what I've been able to glean from the media reports, the new(er) plans would allow the intelligence services and Police to do near-real-time monitoring and analysis of data traffic, data-mining for patterns that are indicative of the kinds of activity they're interested in.

Putting all the data from disparate sources (internet email, web traffic, mobile and fixed line call data, etc.) in one place makes the job somewhat easier.

Of course, once you have the monitoring (technical and legal) infrastructure in place, it makes it a lot easier to start monitoring content a bit further down the road...
 
If this 'editor' made a point about posters using anon proxies, and started being a cunt about it, he could be storing up a lot of problems for himself and posters.
You know what? I'm getting really fed up with you posting up your sneery little, off-topic digs, so if you've got an actual relevent point to make to this discussion rather than some vague personal beef or another, why not just spit it out?
 
Encrypted emails from cybercafes? Instantly says that there's something interesting going on. Encrypted email from cafe from otherwise unused account to another unused account, hell it doesn't matter what's in the message you KNOW there's something serious going on.
I'm not so sure that one email sent once from a webcafe (or cracked wi-fi network) with a message encrypted in, say, an image file is likely to be flagged up myself and even if it was, the sender will be long gone.
 
You know what? I'm getting really fed up with you posting up your sneery little, off-topic digs, so if you've got an actual relevent point to make to this discussion rather than some vague personal beef or another, why not just spit it out?

Show me where I've posted some kind of off topic sneery dig. Go on show me.
 
What's the policy on proxies / TOR etc here?

I'd have though a forum with a 'protest' section this would be highly relevant.
 
I've always thought that posters here should be able to post using whatever kind of proxy they choose, without being picked out by you and other moderators for it. The point being that what you don't know you can't tell whatever the circumstances...
 
I've always thought that posters here should be able to post using whatever kind of proxy they choose, without being picked out by you and other moderators for it. The point being that what you don't know you can't tell whatever the circumstances...
Right. So it was indeed an off-topic little dig because you're personally unhappy with some unrelated aspect of moderation?

If you wish to discuss moderation concerning registration matters, take it to the feedback forum where I'm sure you'll get an answer.

But preferably leave out the "cunt" bit, please.
 
Not so sharpish if his mates have been communicating through encrypted emails sent from different anonymous proxies via web cafes and chatting via disposable phone accounts.

As Bob says, that'd immediately flag you up in a network analysis. "This one has lots of loose ends," the software will say. "Wonder why?" its users will ask.

And the only activity I can think of where the one steganographised message sent from a throwaway account that you mention would be relevant is that of a technically-sussed whistleblower.

Pretty much everything else requires at least repeated communication - lots of loose ends in your contact network.
 
As Bob says, that'd immediately flag you up in a network analysis. "This one has lots of loose ends," the software will say. "Wonder why?" its users will ask.
Only if it is powerful enough to look into emails where short text messages may have been concealed in zipped and encrypted images.

I'm afraid I don't share your faith in such a system.
 
Only if it is powerful enough to look into emails where short text messages may have been concealed in zipped and encrypted images.

I'm not talking about looking into content at all. The "loose ends" would be merely a matter of the system detecting that you had communicated with a lot of devices/accounts with unknown owners/affiliations.

I'm afraid I don't share your faith in such a system.

I'm not interested in faith.

I'm interested in discussing what the actual limits of such systems, plural, may be - logical, mathematical and hardware limits.

The first thing necessary to this discussion, it seems, is separating two families of system:

1) Contact-network analysis

2) Content searching

I've given my two posts above these headlines, for clarity.
 
I'm not talking about looking into content at all. The "loose ends" would be merely a matter of the system detecting that you had communicated with a lot of devices/accounts with unknown owners/affiliations.
So I send one email via an unsecured wi-fi business network, effectively burying the mail in with the legitimate traffic from that router. In that mail (encrypted in a JPG file or not) I give my associate a fresh email address - perhaps disguised as something else - to reply to. He does the same when he replies, while posting from one of the many unsecured/easily hacked wi-fi networks around the city. Neither of us use the same email address/network more than once.

How might the system detect a trend here?

I've no doubt that it's technicaly possible to construct some means of monitoring vast amounts of human data to discover trends, but I have immense doubts about how easy it would be to circumvent it, as well as massive doubts about its robustness and security.
 
Right. So it was indeed an off-topic little dig because you're personally unhappy with some unrelated aspect of moderation?

If you wish to discuss moderation concerning registration matters, take it to the feedback forum where I'm sure you'll get an answer.

But preferably leave out the "cunt" bit, please.

:rolleyes:
 
So I send one email via an unsecured wi-fi business network, effectively burying the mail in with the legitimate traffic from that router. In that mail (encrypted in a JPG file or not) I give my associate a fresh email address - perhaps disguised as something else - to reply to. He does the same when he replies, while posting from one of the many unsecured/easily hacked wi-fi networks around the city. Neither of us use the same email address/network more than once.

How might the system detect a trend here?

I've thought of two ways.

The question now is: can you think of them?

If you can't, your attempt at concealing the fact that you have communicated with your associate is likely buggered. :D
 
Back
Top Bottom