What Was Once Buried, Coming To Light... And Justice
Part 12 of Artificial General Intelligence (And Superintelligence) And How To Survive It
Once, deep-cover spies, as portrayed in The Americans and actually witnessed in Operation Ghost Stories, are considered one of the hardest intelligence operations to uncover. My 1/5/2020 message to the FBI was on how big data searches of records could be executed and analyzed by AI, and how we could further stalk all espionage operations in America using artificial intelligence to sift vast archives of video evidence.
This may sound significant, and presumably was.
But remember two things.
One, this is using technology from the beginning of 2020, 3 1/2 years ago.
Part of the point of this journal is how far we’ve come in the last three years, and even three months. So what we can use now is far more powerful.
Two, if you can enrich your searches and analysis with multiple data sources - ideally, a legion of them - and integrate them well, the benefits you gain should grow exponentially, not linearly.
Or to put it another way, small contributions will have enormous effects.
And since 2020, we’ve had a lot of small contributions, and a lot of enormous ones.
Evidence abounds, and so does AI.
So let’s begin, but remember this is a glimpse into how far we’d come years ago - even for a casual observer on the Internet, providing tips and innovations in his spare time.
And where there’s repetition from excerpts you’ve seen, below, just skim over it. But it’s still there in full, rather than break the flow. In case it isn’t obvious, there’s a lot to cover, even in the messages I’m willing to share in whole or in part.
Data Mining Deep-Cover Operatives Part 1 of 2 – Big Data Searches for Historical Gaps
Deep-cover spies unearthed by a combination of a review of records – during security reviews undertaken for top-clearances, for example, or simply using digitization and/or integration of the records themselves as a reason to sift for and resolve anomalies.
We are looking for “the man who wasn’t there,” or the woman.
Specifically, we are looking for a past which does not exist, or which belongs to the person whose identity the spy usurped. Alternatively, we are looking for enough falsification on the part of someone with an otherwise legitimate past in a critical post, who may be obligated to hostile organizations, such as MSS assets collecting technical data.
We will be combining these big data searches of records with the automated sifting of videos and photos where they should be present with a younger face such as school pep rallies or yearbook photos.
If they are missing from an implausibly large number of photographic or video records with no proof of the past they have attested to – say, on a resumé or SF-86 – that may tell us something. The face of a different person, especially one no longer alive, will definitely tell us something.
Subversion methods that made sense 20 or even 10 years ago can be suicidal today.
Reputedly deep cover Russian spies supposed have the police "re"-issue "lost" IDs to provide a legitimate paper trail.
What happens if we check for previous IDs with Big Data?
This tapestry of treason interweaves with many factors, data points and revealing "tells" its participants are utterly oblivious to. So even as it unravels, those who wove it find they unwittingly snared themselves.
We need this scrutiny. And the justice which follows.
So let us consider a few other Big Data options. Then there are obvious markers, like Russian pensions used to pay assets and operatives. Or just anyone caught up in the investigation. Or anyone showing on markers from cryptocurrencies to Eric Garland's files on media subversion to Kremlin number trolls based in the West.
And, of course, as people show up as suspects, you can simply check into the background of anyone whose sketchy past is literally sketchy.
Almost like a vague narrative of a murky past, rather than an actual past.
With those more obvious suspects, investigators can again simply see if the paper trails for those individuals goes all the way back.
Or if it jumps many years and then dead ends in a child's gravestone in an out of the way cemetery.
But in the meantime, when running mass searches... Does someone list obvious data points on a resume? Can they be checked? Did they ever go to that high school or university? What tax records exist for them? No, not what they paid, but whether there's a tax record at all.
Are certain Social Security numbers getting issued at strange times, or not showing up on records since infancy, despite a supposedly long career? The key thing - beyond the people who are on radar already - is to find markers that already exist in accessible databases.
Most people aren't home schooled, cash-only day laborers. Especially not in professional careers.
Those who are, or who legitimately spent time doing that, should be easy to spot.
So obvious markers include re-issued records, tax filings, Social Security number usage, gaps in a verifiable work history - which again goes back to tax records on most legitimate businesses – and gaps in educational history.
Are there fields such as media, politics, sci-tech research and others where someone could have access to significant information and/or influence without the scrutiny of a top-secret-clearance background check?
Again, if you reduce these searches to some embarrassingly parallel problems as you sift records you already have and which are often already digitized, you may find interesting correlations. If law enforcement needs probable cause, they can clearly start with people already under review for the many crimes being unearthed in the Russia investigation or other major investigations.
Counterintelligence presumably can search most government records already if they have cause. Obviously, I already shared a means for leveraging a host of embarrassingly parallel problems with a simple data architecture for potential exascale and/or basic quantum supercomputing with the FBI. To quote that original message: “Obviously investigators will want a tremendous amount of processing power to expedite this research. One way to put together a computer with immense processing power for a relatively narrow set of tasks would be a computer cluster, whether a conventional cluster or one incorporating or entirely assembled from the basic quantum processors now available.
“I have an invention which may provide extremely inexpensive supercooling for an Aiyara cluster, other Hadoop cluster or any other architecture (even cheap 3D chips) and, as a side-effect, could make cooling and shielding quantum processors from outside disruption far more practical. This invention is not yet patented or even provisionally patented given concerns over how easily its main functions could be weaponized. The supercooling innovation is a step in enabling its main applications, but not the only point of the invention.
Cheap, high-density power storage and portable electrical generation are among the key applications (almost a given when low-temperature superconduction becomes trivially easy) which are also relevant to this issue. As noted by the Department of Energy, “System power is the primary constraint for the exascale system.”
The invention in question thus appears to solve the largest problem facing the development of exascale systems as well as facilitating the practical use of quantum computing. Power and cooling are common issues with any of these systems. On a more conventional scale, an Aiyara cluster normally needs to be low-power to function. There is an obvious cooling problem with having so many processors so close together. Even without those innovations or existing exascale systems, the means to decrypt this information on a large scale over a short time certainly exists.
“While the supercomputers employed in this task do not have to be Aiyara clusters or even clusters at all, they are an inexpensive way to mobilize immense amounts of processing power to a specific task. They can also be isolated from the rest of the world, hardwired to and only interacting with the computers forwarding them tasks and receiving their results, and can be built with very specific work in mind.
“A more interesting twist to this would be to create a cluster incorporating or even primarily consisting of the basic quantum processors we do have, to dramatically accelerate our ability to factor prime numbers and decrypt common encryption (such as Tor onion encryption) on a massive scale.
Again, what I have in the way of supercooling could be very helpful in this regard, though I was attempting to tackle the much harder problem of neutralizing the weapons enabled by that technology (by high-density power production and storage, among other things) before releasing it to anyone.
If it is needed for a law-enforcement action on this scale, however, I will see what I can do.
“Inexpensive, large-scale supercooling requiring minimal infrastructure would enable a massive cluster to be assembled without heating issues and with lower power requirements, allowing you to work more easily at scale and offering greater latitude in the processors that could be incorporated. By making it easier to shield quantum processors with far less infrastructure while assembling them in much denser architectures it could also make a quantum-based or quantum-incorporating cluster much more feasible, though whether the large number of processors you might prefer would actually be commercially available is another question. 3D chips should also be vastly easier under this technique, though obviously these must first be engineered as a practical option given present technological limitations and so, ironically, may be the last element to become available.
“Clusters paired with existing supercomputers and any quantum capacity presently available should already be adequate for this mass decryption given time, but other technologies are clearly imminent even without further major breakthroughs.
“Other innovations may emerge depending, again, on who is investigating and where the targets of that investigation may be.
Counterintelligence may find the unsecured computers drawn into a botnet remain vulnerable to counter-hacking and, where they have the legal authority to do so, may be able to unravel them in a mass hack, possibly backed by the assets cited above. A foreign-intelligence controlled botnet of foreign computers might be temporarily turned into a botnet under investigators’ control as part of a counterintelligence operation, if only in using their own processors to search them for malware and data breadcrumbs. Mass hacking a foreign criminal darknet may be enabled by the ability to mass decrypt using clusters and/or quantum processing.”
But in the meantime, do you really need to justify asking for the basic background of people working on remotely critical government projects from your contractors, and then running a search on all of it? Again, narrow to the most critical niches, check your people and expand from there. There are presumably large databases you don't have to justify checking at all.
Military, intelligence and Federal law enforcement background checks, for example, or critical government contractors with access to sensitive information. We do not to examine people recklessly or intrusively. But we clearly have an issue with long-term subversion and enough routine or ongoing investigations providing starting points in unraveling this web of potential deep-cover spies. And while I hope we will have decimated those networks permanently at the end of this saga, I see no reason to leave any stone unturned.
Part 2 of 2 – Automated Analysis of Relevant Photographic and Video Evidence
My 11/3/19 message on using video to uncover spies and operatives is also applicable to this issue, in terms of automating the analysis of older photographic and video evidence examined when validating a potential deep-cover asset’s past life:
“On 10/23/19, I sent the FBI some suggestions on how molested children in seized, illegal Darknet videos could be identified by an automated process combining more conventional generative adversarial networks with some additional, potentially dramatic improvements, and on 10/29/19 I expanded on those improvements, and noted some additional applications for the radical enhancement of facial recognition technology and how to implement them.
“Notably, I explained how merging frames and optical flow could be employed to enhance images and how generative adversarial networks could be trained to spot persons of interest from seemingly random crowds.
“I would like to expand on and clarify these points, as I may not have been clear on how the de-noising of one frame from a video could be expanded automatically to include enhancing all frames – and hence the entire video – how video could be harvested and combined from multiple sources, and how audio could also be collected, filtered and merged – where available – with these sharpened videos.
“To quote from that 10/29/19 message, where I am in turn quoting from my 10/23/19 transmission:
“““…the possible means to identify virtually all of the victims and perhaps even the perpetrators whose faces can be glimpsed in any of these materials, regardless of what age they are now, how bad the images are, features caught at odd angles and in fragmentary fashion, and so forth.
“““Generative adversarial networks are a method of machine learning commonly applied to problems such as enhancing images and detecting deep fakes. Two neural networks challenge each other in a game, each trying to “outwit” the other. They are given a training set as an example, and subsequently learn to generate new data sets meeting the same statistical parameters as the training set.”
““Regarding merging frames and optical flow, I also wrote:
“““When we are looking at video, as opposed to still photos, we are seeing hundreds if not thousands of frames, many of which are essentially the same image at slightly different angles. In addition to conventional techniques, we can compare those images to each other, likely providing a clearer resolution in aggregate – as static, pixilation and other flaws are removed in favor of clearer views.
“““Optical flow has been used to assess videos to create frames filling in gaps between normally filmed video frames in order to create smooth slow-motion clips without the requisite data. The aggregate data could be merged in a similar way to create clearer images.
https://people.cs.umass.edu/~hzjiang//projects/superslomo/
“““We can also undoubtedly stitch together accurate images of some faces based on different glimpses from different angles, given sufficient data. Ironically, some deep fake work may help here, because of the efforts to fill in unseen parts of a face when a deep fake moves to reveal parts of a person never seen in the hijacked original clip.”
““Building on techniques such as image de-noising and optical flow, GANs could also use the many images recorded over time in a video as reference points for more elaborate and ambitious de-noising of corrupted or low-resolution images, and also for assembling a usable and identifiable face from multiple partial views at different angles.
““Since effects such as optical flow demonstrate the ability to keep track of multiple points in one frame as they correspond to the same points in the next, we already have the capacity to automate the comparison of one frame with the next, and eventually multiple frames.
““Even erasing flaws by comparing three or four frames would be helpful, much less ten or twenty or a hundred.
““How can we expedite the training of GANs to do this?
““Obviously, we could start with a database featuring a combination of video clips and photos of the same subjects. Just taking high-resolution video, selecting one frame out as an end goal, and then having the GAN practice with a lower-resolution and/or corrupted version of that video would suffice for the earliest stage of training. Ideally, we would eventually be able to make substantial gains with even markedly substandard clips, potentially even surpassing the clarity of the original high-resolution frame.
““We could eventually progress to combining multiple astronomical or surveillance photos to hone the basic technique further, though photos and video featuring identifiable faces would feature in our best training sets.
““To provide a large number of potentially public domain clips for this work, video clips of public figures such as politicians or celebrities could be used not only to practice enhanced resolution, but to practice recognizing them in unrelated photos, potentially taken at different times, wearing different hairstyles, makeup, costumes and having their identities otherwise obscured. In later stages, pulling celebrities out of random crowds in Hollywood, LA, Nashville and New York could also refine this system.
““A more advanced training set would be to take a database of photos of people with top-secret clearance and other individuals of interest, and run facial recognition on everyone a known, monitored operative makes contact with, using the same techniques to enhance image quality and to analyze databases.
““Other training sets would include spotting and identifying a known figure out of crowds from different angles using different photos or videos clips. Merely having known analysts and other government employees included in the project walking outside of buildings visible from multiple vantage points by multiple security cameras and other surveillance systems both at predictable times (the beginning and end of a workday, lunch hour) and unanticipated ones allows you to train a system to track individuals of interest through security systems in sensitive areas. A critical benchmark would be the ability to acquire and track figures who are not the focus of a normal surveillance camera at a considerable distance, despite crowds, poor visibility and other issues. Once this is achieved, many possibilities open up for you.
““Eventually, you could use this method as an additional means of continuously tracking hostile operatives, dangerous criminals and known terrorists – not only in real time, but tracing their paths and encounters everywhere and at any time in which you have such footage – say, from DC government security cameras – eventually creating an extensive map of their activities which can be added to whatever other intelligence and evidence you may have on them. There would inevitably be legal limits to this, particularly with US citizens not under warrant, but those would be for the relevant law enforcement and counterintelligence agencies to consider.
““Such a map could be merged with other temporal and/or geolocated data – bank transfers, texts, calls, cryptocurrency exchanges, other persons of interest in the vicinity – to further enrich this map. I can elaborate on this option at greater length in the future.”
“Briefly, if one frame of a video can be de-noised through a reference to one or preferably many other frames in the same video, it should be self-evident that all of
them could be.
“We can, however, further enhance our output by collecting video transmission from multiple sources and knitting them together, whenever we know there is a person of interest to be tracked.
“Multiple security cameras, cell-phone videos and even photos can be merged to this end. Bear in mind that many of the recordings are both timestamped and geolocated. Indeed, the area covered by many security cameras never really changes. Which means that training GANs to merge and enhance images further based on multiple perspectives is a real option, as is using high resolution photographs as a touchstone.
“Using multiple video sources can also enhance video quality, like binocular summation in humans and many animals, but in this case adding to the aggregate sharpening of the end product.
“But they can also provide multiple angles on a scene, critical if you need clear views of each of the participants in a conversation. GANs could easily be trained to merge images and videos in this way, and to use the additional reference frames to enhance the final product.
“Further, audio taken deliberately or incidentally may also prove useful. Aside from any recordings intentionally made at the time, such as from hacked mobile devices, nearby phone calls create a microphone on the scene. Multiple calls, if they can be legally accessed, could be filtered for relevant output, merged and de-noised.
“Here is a description of a neural network which learned to view faces and isolate only the speech of selected figures in a video. While this shows promise given our desire to hear what surveilled figures may be saying, remember that GANs could easily be trained to pick out keywords and listen to conversations coming from the people saying them or to track the voices of individuals otherwise noted as important.
“This AI Learned to Isolate Speech Signals
“Remember also that given the circumstances, we may again have people being recorded, but also more than one suspect being tracked and recorded on their calls under warrant, voicemails recording relevant, time-stamped audio and unrelated individuals willing to share video clips, calls or voicemails they might have picking up relevant background “noise.”
“The other detail to remember is that lip reading is entirely possible from video, and GANs can be trained to do this as well.”