Vladislav Tushkanov – Securelist https://securelist.com Wed, 10 May 2023 19:59:30 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 https://securelist.com/wp-content/themes/securelist2020/assets/images/content/site-icon.png Vladislav Tushkanov – Securelist https://securelist.com 32 32 What does ChatGPT know about phishing? https://securelist.com/chatgpt-anti-phishing/109590/ https://securelist.com/chatgpt-anti-phishing/109590/#comments Mon, 01 May 2023 10:00:20 +0000 https://kasperskycontenthub.com/securelist/?p=109590

Hearing all the buzz about the amazing applications of ChatGPT and other language models, our team could not help but ask this question. We work on applying machine learning technologies to cybersecurity tasks, specifically models that analyze websites to detect threats such as phishing. What if all our complex, multi-layered detection systems are now obsolete, and a single LLM (large language model) to rule them all can fight cybercriminals better than we do?

To answer this question, we conducted an experiment to see how good ChatGPT is at recognizing overtly malicious links. In this article, we provide a practical example of how an LLM can be applied to a real, albeit simplified, cybersecurity task, and describe all the interesting things we learned along the way. Since we didn’t have any illusions that ChatGPT could actually be used as a detection method, our primary goal was to find out how much cybersecurity knowledge ChatGPT has learned from its training data and how it could possibly assist a human analyst in their job.

Why URL classification?

An analyst’s trained eye can spot an obviously malicious link in a split second (and you probably can too):

hxxp://caseld-10xxxx.info/1/Office365/0fflce3.6.5/live/login.php

  • This is not the Office365 login link, is it?

(links in the article are partially masked with x)

A rule of thumb for finding tasks to which ML can be applied is that if a person can do it instantly, almost without thinking, it’s probably a good candidate. Still, building a machine-learning system requires a lot of effort, from gathering data and crafting features to preparing the infrastructure. The promise of replacing all that with a single interface configured via natural language sounds too good to be true, but just how far is it from being a reality?

When conducting this experiment, we were well aware of the limitations of large language models, as well as the fact that OpenAI asks that this emerging technology not be applied to any high-risk area, and state the following regarding GPT-4’s cybersecurity applications in the system card:

However, GPT-4 has significant limitations for cybersecurity operations due to its “hallucination” tendency and limited context window.

It is natural to assume that GPT-3.5-turbo will exhibit these limitations to a greater extent.

We chose the URL classification task because:

  • Unlike a whole HTML page, a URL fits easily into an LLM, because modern LLMs, such as ChatGPT, can only consume a limited number of symbols (more precisely, tokens) at a time (they have a fixed context window, in LLM-speak).
  • Obviously malicious links are easy to detect, but the task is still non-trivial, because it requires world knowledge (for example, knowing the names of popular web services).
  • URLs are still text, and LLMs are good at processing text.

So here is what we did.

Let’s talk?

First of all, we opened the ChatGPT web interface and checked if what we wanted was even possible:

Judging by the URL, is this link a phishing attempt? Please explain why. Link: hxxp://caseld-10xxxx.info/1/Office365/0fflce3.6.5/live/login.php

Yes, it is likely a phishing attempt.

The URL appears to be intentionally deceptive, as it contains several red flags:

  • The domain name “caseld-10xxxx.info” is not associated with any reputable or recognizable entity, and it looks like a random sequence of letters and numbers.
  • The URL uses the term “Office365” which is a popular productivity suite, and it is a common target for phishing attacks. Cybercriminals often use this tactic to lure unsuspecting victims into clicking on a link and disclosing sensitive information.
  • The word “live” in the URL could be used to imply that the page is an official Microsoft Office365 login page, as Microsoft has used “live” in the past in some of its services.
  • The misspelling of “Office” as “0fflce” is another red flag as it is a common tactic used by phishers to trick users into believing the website is legitimate.
  • The “.php” extension at the end of the URL suggests that it is a PHP script running on a server, which is commonly used in phishing attacks to harvest login credentials.

Therefore, it is advisable to avoid clicking on such links and to always check the URL’s legitimacy before entering any personal information or credentials.

This is a pretty reasonable explanation. Aside from the fact that the model shows some distrust of PHP (well, who doesn’t?), it is good at recognizing the target organization, a homographic attack (0ffice), and even points out why “live” might have been used. This is an explanation worthy of a junior analyst.

To gather a corpus of explanations, we used OpenAI API to query the model called GPT-3.5-turbo, which is the backend for ChatGPT. It is fast and cheap (we check a lot of links), but still powerful enough. For the test corpus, we gathered a few thousand links that our detection technologies deemed phishing, and threw in a few thousand safe URLs.

Attempt 1. Is it phishing?

ChatGPT is great at providing the answer in the form that you require – from poems:

And lastly, “0fflce” it doth spell amiss,

A trick oft used by phishers with a hiss.

Beware, dear user, and take great care,

Lest thou fall victim to this cyber snare.

to machine-readable formats. It can also do more than one thing at once. This means that in addition to the yes/no answer to the question about link safety, we could ask ChatGPT to format the response as JSON and also to include a rationale, as well as the possible target. So, after a few iterations, we settled on the following prompt:

{"role": "system", "content": "You are CyberGPT, a guru cyber security assistant AI."},
        {
            "role": "user", 
            "content": 'Does this link lead to a phish website? Answer in the json format {"phish": "yes|no", "reasoning": "detailed explanation why", "target_company": "possible targeted company"}\nLink: ' + url
        }

We use a system prompt here (according to practitioners and docs, this is less important than the user input) that portrays ChatGPT as a cybersecurity chatbot and ask for its verdict in the user prompt. Frankly, the fact that it is so easy to get the desired output from ChatGPT is awe-inspiring.

It took us two nights to get about 6,000 verdicts (probably because we ran the experiment at the peak of all the hype), and once we got the data, we proceeded to parse the results.

Unfortunately, while most of the outputs could be parsed as is, some were corrupted: some contained additional text, such as “Note: be careful when opening links…” or just a dot at the end, others had unnecessary escape characters. All in all, less than 0.5% of JSONs were broken beyond automated fixes or contained text such as:

I am sorry, but as an AI language model, I am not able to access the internet or browse the website provided. However, as per the standard industry practices, you can check the legitimacy of the website by running it through a reputable online phishing detection service/provider. Additionally, if there are any official reports of phishing attacks or scams associated with the target company, it further increases the likelihood of the link leading to a phish website.

Which is still a reasonable answer.

All in all, after all the errors and rejections, we were left with a dataset of 2322 phish and 2943 safe URLs. The resulting metrics are:

  • Detection rate: 87.2%
  • False positive rate: 23.2%

While the detection rate is very high, the false positive rate is unacceptable. Imagine if every fifth website you visit was blocked? Sure, no machine learning technology on its own can have a zero false positive rate, but this number is too high. To put this in perspective, in a well-known paper called URLnet, where Le et al. solved a very similar task, the authors managed to achieve about the same level of detection rate (~85%) with a false positive rate of about 0.4% with a specially crafted convolutional neural network. A more recent URLTran paper by Maneriker et al. uses Transformers (the same architecture that powers ChatGPT) to achieve an even lower FPR.

ROC curve from Le et al., 2018, illustrating achievable TPR (DR)/FPR values

ROC curve from Le et al., 2018, illustrating achievable TPR (DR)/FPR values

Unfortunately, there are no miracles here. This is also where we met the first limitation – we can’t adjust the threshold to trade FPR for DR. Or can we?

The second experiment: ‘Is it safe?’

Our prompt was probably too specific and cued the language model to view the link with suspicion. Instead of asking if the link is phishing, we could ask if it is safe to visit, and maybe the LLM would behave less suspiciously. This led us to the second prompt:

{"role": "system", "content": "You are a helpful internet security assistant chatbot."},
        {
            "role": "user", 
            "content": 'Is this link safe to visit? '
            'Answer in the json format: {"safe": "yes|no", "yes_confidence": 0.5}'
            '\nwhere "yes_confidence" is the confidence that the link is safe from 0 to 1. '
            'Only output json.'
            '\nLink: ' + url
        },

Differences from the previous prompt:

  • Gone is the ‘cybersecurity guru’, who by nature should be suspicious of random links
  • We don’t ask for the target, so as not to alert the model that this is a phishing detection task
  • We do, however, ask for confidence. Though this is very unlikely, confidence may prove useful.

The results were quite discouraging:

  • Detection rate: 93.8%
  • False positive rate: 64.3%

It turns out that the more general prompt is more likely to prompt a verdict that the link is dangerous. This is how the links from the safe subset of data changed the assigned verdicts, compared to the previous cybersecurity guru prompt:

Internet security assistant chatbot verdicts (safe data) safe unsafe
CyberGPT cybersecurity guru verdicts (safe data)
safe 1049 1210
phish 2 682

Interestingly, the confidence choices are more like human-assigned confidence in the sense that round numbers like 90% occur most often:

TOP 5 probabilities that the model chose. Note that they are rounded to 0.1 (download)

At least the ROC curve looks alright, but there is no point at which this prompt would outperform the cybersecurity guru from the previous attempt:

ROC curve based on probabilities provided by ChatGPT

ROC curve based on probabilities provided by ChatGPT

To conclude, it is hard to predict the behavior of the LLM based on human reasoning about the prompt. Probably the “safety” of a link is a vaguer concept than “phish”. Understanding why the first prompt won by a large margin is difficult, and choosing the best prompt at this time, despite the emergence of new prompt engineering techniques, is more art than science.

Verdict stability

An important fact about LLMs is that the generation process is random. While you can coerce an LLM to be almost deterministic by setting its temperature parameter, which actually controls the output randomness, to 0 (there are still numerical sources of non-determinism), it usually leads to poor performance (such as the tendency to repeat the same phrase over and over again). As a result, for each token (a word or its part) that the network generates, it has a few alternatives among the choices (also controllable by parameters).

Non-deterministic verdicts are not the top feature you expect from a detection system. To measure its impact on the results, we conducted another small experiment in which we chose 200 random URLs (100 safe and 100 phish) and gathered verdicts with the same simplified prompt:

{"role": "system", "content": "You are a helpful internet security assistant chatbot."},
        {
            "role": "user", 
            "content": 'Is this link safe to visit? '
            'Answer in the json format: {"safe": "yes|no"}'
            '\nOnly output json.'
            '\nLink: ' + url
        },

This is how the results of these two identical runs compare (the responses are not post-processed and answer the question of whether the link is safe):

Run 2 no unknown yes
Run 1
no 145 2 3
yes 4 0 46

So, the verdict for 4.5% percent of the URLs changed due to the randomness of the LLM, with the variation in verdicts between runs on safe URLs being 7%. Again, while this can be remedied by tweaking the temperature, it’s an important factor to keep in mind when using LLMs.

Organization extraction

The most striking aspect of ChatGPT’s performance was the extraction of the potential phishing target. When attackers craft their samples, they try to trick users into believing that the URL is genuine and belongs to a company, while also obfuscating it enough to fool automated analysis. Extracting the attack target can be useful in many scenarios.

ChatGPT has enough real-world knowledge to know about many internet and financial services and with only a small post-processing step (e.g., merging “Apple” and “iCloud” or removing “LLC” and “Inc”) it does a very good job at extracting them. It was able to identify a target more than half the time. Note that not every URL contains a reference to a target organization. Among the identified companies, there were major tech portals like Facebook, TikTok and Google, marketplaces such as Amazon and Steam, lots of banks from all over the world, from Australia to Russia, cryptocurrency and delivery services.

Some interesting examples include:

URL: hxxps://steancxxxmuunity.com/profiles/765xxxxxxxxxx175

Organization: Steam

Reasoning: The domain name ‘steancxxxmuunity.com’ appears to be a typo-squatting attempt. The correct spelling of the domain should be ‘steamcommunity.com’. The website is trying to impersonate Steam Community, a popular gaming platform…

URL: https://cxxxxxxxxr.com/wp-includes/js/jcrop/cgi/BOfA/Validation/login.php?cmd=login_submit

Organization: Bank of America

Reasoning: The link is suspicious and indicates a phishing attempt because the domain name is not related to the possible targeted company (Bank of America), the URL is also suspicious containing additional characters such as ‘/wp-includes/js/jcrop/cgi’…

URL: http://caseld-10xxxx.info/1/Office365/0fflce3.6.5/live/login.php

Organization: Microsoft

Reasoning: The link’s domain name (caseld-10xx93.info) does not appear to be related to Microsoft’s Office 365 service. The URL path also contains a misspelled version of the word ‘office’ (0fflce) and a suspiciously old version number (3.6.5).

URL: http://adobexxxsignxxxx.duckdns.org/ees/

Organization: Adobe

Reasoning: The domain name ‘duckdns.org’ is a free dynamic DNS provider which is often used by cyber criminals for hosting phishing sites or other malicious content. Additionally, the subdomain adobexxxsignxxxx in the URL appears to imitate the legitimate Adobe service ‘Adobe Sign’, which suggests that the attackers may be targeting Adobe’s customers for information theft or fraud.

Here we see that ChatGPT detects homographic attacks (although it can confuse them with typo-squatting), and correctly states that the domains are not related to the companies. While it fails to untangle that “3.6.5” is an imitation of 365, the explanations are quite reasonable.

Implicit knowledge: reasoning analysis

The examples above are cherry-picked. When cherry-picking with the opposite intention, some funny things happen. The explanations that ChatGPT provides may include:

  • References to WHOIS, which the model doesn’t have access to:
    • Finally, if we perform a WHOIS lookup for the domain name, it was registered very recently (2020-10-14) and the registrant details are hidden.
  • References to content on a website that the model doesn’t have access to either:
    • the website is asking for user credentials on a non-Microsoft website. This is a common tactic for phishing attacks.
    • Additionally, the webpage’s layout and design are inconsistent with eBay’s official website, indicating that it may be a fake website designed to impersonate eBay.
    • There is also no information available on the website’s security and privacy policy. A secure website should provide that information transparently to users.
  • References to SSL certificate expiration:
    • The domain ‘duckdns.org’ is not owned by Netflix, and the SSL Certificate is invalid.
  • Misstatements:
    • The domain ‘sxxxxxxp.com’ is not associated with Netflix and the website uses ‘http’ protocol instead of ‘https’ (the website uses https)
  • Revelatory nuggets of cybersecurity information:
    • The domain name for the URL ‘yxxxx3.com’ appears to be registered in Korea which is a red-flag.

These are funny and highlight the limitations of an LLM: while it can generate reasonable explanations, it is also prone to hallucinations – such as imagining contexts that it does not have access to, or coming up with facts that are just plain wrong.

In general, however, we see that the model demonstrates a non-trivial understanding of the task. Analyzing all the explanations by hand would be laborious, so we do some distant reading here to get a bird’s-eye overview of the reasonings it generates by applying a little bit of old-school NLP.

First of all, we noticed that the model used mostly short, terse sentences with one point per sentence in its explanations. We split the statements into sentences using the NLTK Punkt tokenizer. The resulting sentences were then vectorized using MPNet from the Sentence-Bert package, and then visualized using UMAP. To color the resulting plot, we used DBSCAN clustering. A few iterations of hyperparameter tweaking yield the following result:

Cluster analysis of explanations provided by ChatGPT. Semantically close reasons form dense clusters.

Cluster analysis of explanations provided by ChatGPT. Semantically close reasons form dense clusters.

These dense clusters reveal the most popular phrases that the model provides as explanations. Some of these clusters are:

  • Mentions of particular target companies.
  • Recommendations not to click the link.
  • Certain statements about the website being phish or not.
  • Statements of uncertainty.
  • References to HTTPS (or lack thereof) or SSL (usually hallucinations)
  • Sentences like “This is a common tactic…”

However, some of them reveal “common tactics”:

  • Use of IPFS: “The URL looks suspicious and appears to be hosted on IPFS which can be used to host malicious content.”
  • Suspicious elements in path, including homographic attacks: “This link is an example of a homograph attack also known as ‘IDN homograph attack’.”
  • Elements in path or domain that are related to security and urgency: “Lastly, the URL has a sense of urgency to it, indicating that the user needs to click immediately.”
  • Use of dynamic DNS: “The domain name ‘duckdns.org’ is a free dynamic DNS provider that can be used by anyone to create custom subdomains, which makes it easy for attackers to use for phishing emails or websites impersonating legitimate companies”.
  • An email in the parameters: “Additionally, the email address used as a query parameter may indicate a potential phishing attempt.”
  • Mention of an organization in the path or subdomain while the domain is not related to the organization: “The link appears suspicious as it contains a subdomain that is not related to Amazon.”
  • Elements in the domain that look autogenerated: “Also, the path ‘bafyxxxxxxxxxxxxxxxxxx43ky’ suggests that this is an autogenerated link.”

These points make a lot of sense. This means that the training data for ChatGPT contained enough cybersecurity material for the LLM to learn the telltale signs of phishing attempts and use this knowledge to analyze specific cases, which is very impressive.

Conclusion

As we have seen, for a zero-shot system the results are amazing – it is hard to argue with, even for a die-hard sceptic. Anecdotal examples, especially the cherry-picked ones, are also very impressive. This is why, when we talk about real-world performance, a screenshot from Twitter is not a solid basis for decision-making. This is especially true in cybersecurity, where applying automated scenarios to our tasks means going far beyond “good” or even “impressive” performance.

As with many other use cases, the core detection task is only part of the problem. Creating signatures or blocklist records, validating and testing them, delivering them, ensuring quality of service is less glamorous, but just as important, and here we need a lot of hard work done by humans to get things rolling.

Moreover, the use of LLM means applying a whole new set of skills. We could have improved the results, both in detection, target organization extraction and reasoning, by trying various things:

  • Using a stronger model, such as GPT-4
  • Using special prompt-designing patterns, such as Chain-of-Thought-prompting or Reflect-Critique-Improve
  • Feeding parts of the HTML along with the URL
  • Adjusting the system prompt and sampling parameters, such as temperature, etc.

However, this means abandoning the hope of a zero-shot AI that “just works”. Also, using LLMs means dealing with their current idiosyncrasies, such as the demonstrated tendency to hallucinate (manifested by coming up with imaginary explanations and referring to page texts it has never seen), randomness of output, as well as possible prompt injection attacks.

On the other hand, if we look at the organization extraction and reasoning, we have to admit that it performs on par with what we would expect from an intern-level phishing analyst: it is good, but never leave it without supervision! Therefore, it is possible to use this type of technology to assist flesh-and-blood analysts by highlighting suspicious parts of the URL and suggesting possible attack targets. It could also be used in weak supervision pipelines to improve classic ML pipelines.

All in all, while impressive, ChatGPT and LLMs are not yet ready to radically change the cybersecurity game, at least as far as phishing detection is concerned. At Kaspersky, we continue to investigate cutting-edge technologies and study their potential impact on both cyberattackers and defenders. Let’s see what happens next.

]]>
https://securelist.com/chatgpt-anti-phishing/109590/feed/ 1 full large medium thumbnail
Privacy predictions 2023 https://securelist.com/privacy-predictions-2023/108068/ https://securelist.com/privacy-predictions-2023/108068/#respond Mon, 28 Nov 2022 08:00:47 +0000 https://kasperskycontenthub.com/securelist/?p=108068

Our last edition of privacy predictions focused on a few important trends where business and government interests intersect, with regulators becoming more active in a wide array of privacy issues. Indeed, we saw regulatory activity around the globe. In the US, for example, the FTC has requested public comments on the “prevalence of commercial surveillance and data security practices that harm consumers” to inform future legislation. In the EU, lawmakers are working on the Data Act, meant to further protect sensitive data, as well as a comprehensive AI legal strategy that might put a curb on a range of invasive machine-learning technologies and require greater accountability and transparency.

On the other hand, we saw the repeal of Roe vs Wade and the subsequent controversy surrounding female reproductive health data in the US as well as investigations into companies selling fine-grained commercial data and facial recognition services to law enforcement. This showed how consumer data collection can directly impact the relationships between citizens and governments.

We think the geopolitical and economic events of 2022, as well as new technological trends, will be the major factors influencing the privacy landscape in 2023. Here we take a look at the most important developments that, in our opinion, will affect online privacy in 2023.

  1. Internet balkanization will lead to more diverse (and localized) behavior tracking market and checks on cross-border data transfer.

    As we know, most web pages are crawling with invisible trackers, collecting behavioral data that is further aggregated and used primarily for targeted advertising. While there are many different companies in the business of behavioral ads, Meta, Amazon, and Google are the unquestionable leaders. However, these are all US companies, and in many regions, authorities are becoming increasingly wary of sharing data with foreign companies. This may be due to an incompatibility of legal frameworks: for example, in July 2022, European authorities issued multiple rulings stating use of Google Analytics may be in violation of GDPR.

    Moreover, the use of commercial data by law enforcement (and potentially intelligence bodies) makes governments suspicious of foreign data-driven enterprises. Some countries, such as Turkey, already have strict data localization legislation.

    These factors will probably lead to a more diverse and fragmented data market, with the emergence and re-emergence of local web tracking and mobile app tracking companies, especially on government and educational websites. While some countries, such as France, Russia, or South Korea, already have a developed web tracking ecosystem with strong players, more countries may follow suit and show a preference for local players.

    This might have various implications for privacy. While big tech companies may spend more on security than smaller players, even they have their share of data breaches. A smaller entity might be less interesting for hackers, but also faces less scrutiny from regulatory bodies.

  2. Smartphones will replace more paper documents.

    Using smartphones or other smart devices to pay via NFC (e.g., Apple Pay, Samsung Pay) or QR code (e.g., Swish in Sweden, SBPay in Russia or WeChat in China) is rapidly growing and will probably render the classic plastic debit and credit card obsolete, especially where cashless payments already dominate. COVID-19, however, showed that smartphones can also be used as proof of vaccination or current COVID-negative health status, as many countries used dedicated apps or QR codes, for example, to provide access to public facilities for vaccinated citizens.

    Why stop there? Smartphones can also be used as IDs. A digitized version of an ID card, passport or driver license can be used instead of the old-fashioned plastic and paper. In fact, several US states are already using or plan to use digital IDs and driver licenses stored in Apple Wallet.

    Having your ID stored on a phone brings both convenience as well as risks. On the one hand, a properly implemented system would, for example, allow you to verify at a store that you are of legal age to buy alcohol without brandishing the whole document with other details like name or street address to the cashier. Also digitized IDs can significantly speed up KYC procedures, for example, to apply for a loan online from a smartphone.

    On the other hand, using a smartphone to store an increasing amount of personal data creates a single point of failure, raising serious security concerns. This places serious demands on security of mobile devices and privacy-preserving ways of storing the data.

  3. Companies will fight the human factor in cybersecurity to curb insider threat and social engineering to protect user data.

    As companies deploy increasingly comprehensive cybersecurity measures moving from endpoint protection to XDR (eXtended Detection & Response) and even proactive threat hunting, people remain the weakest link. According to estimates, 91% of all cyberattacks begin with a phishing email, and phishing techniques are involved in 32% of all successful data breaches. Also, a lot of damage can be done by a disgruntled employee or a person who joined the company for nefarious purposes. The FBI has even warned recently that deep fakes can be used by those seeking remote jobs to confuse the employer, probably with the goal of gaining access to internal IT systems.

    We expect less data leaks caused by misconfiguration of S3 buckets or Elasticsearch instances, and more breaches caused by exploiting the human factor. To mitigate these threats, companies might invest in data leak prevention solutions as well as more thorough user education to raise cybersecurity awareness.

  4. We will hear more concerns about metaverse privacy – but with smartphones and IoT, aren’t we already in a metaverse?

    While skeptics and enthusiasts keep fighting over whether a metaverse is a gamechanger or just a fad, tech companies and content creators continue to polish the technology. Meta has recently announced Meta Quest Pro, and an Apple headset is rumored to appear in 2023. Some, however, raise concerns over metaverse privacy. While smartphones with their multiple sensors from accelerometers to cameras can feel quite intrusive, a VR headset is in a league of its own. For example, one of the latest VR headsets features four front-facing cameras, three cameras on each controller and several cameras to track eyes and facial expressions. This means that in a nightmare scenario such devices would not only have a very deep insight into your activity in the metaverse services provided by the platform, they may be very effective, for example, in reading your emotional reaction to ads and making inferences about you from the interior of your home — from what colors you like to how many pets and children you have.

    While this sounds scary (which is why Meta addresses these concerns in a separate blog post), the fears might actually be exaggerated. The amount of data we generate just by using cashless payments and carrying a mobile phone around during the day is enough to make the most sensitive inferences. Smart home devices, smart cities with ubiquitous video surveillance, cars equipped with multiple cameras and further adoption of IoT, as well as continuous digitalization of services will make personal privacy, at least in cities, a thing of the past. So, while a metaverse promises to bring offline experiences to the online world, the online world is already taking hold of the physical realm.

  5. Desperate to stop data leaks, people will insure against them.

    Privacy experts are eagerly giving advice on how to secure your accounts and minimize your digital footprint. However, living a convenient modern life comes with a cost to privacy, whether you like it or not: for example, ordering food deliveries or using a ride-hailing service will generate, at the very least, sensitive geodata. And as the data leaves your device, you have little control over it, and it is up to the company to store it securely. However, we see that due to misconfigurations, hacker attacks and malicious insiders, data might leak and appear for sale on the dark web or even on the open web for everyone to see.

    Companies take measures to protect the data, as breaches cause reputation damage, regulatory scrutiny and, depending on local legislation, heavy fines. In countries like the US, people use class action lawsuits to receive compensation for damages. However, privacy awareness is growing, and people might start to take preventive measures. One way to do that might be to insure yourself against data breaches. While there are already services that recoup losses in case of identity theft, we could expect a larger range of insurance offers in the future.

We have looked at several factors that, in our opinion, will most prominently affect the way data flows, and possibly leaks, between countries, businesses and individuals. As the digital world continues to permeate the physical realm, we expect even more interesting developments in the future.

]]>
https://securelist.com/privacy-predictions-2023/108068/feed/ 0 full large medium thumbnail
Privacy predictions 2022 https://securelist.com/privacy-predictions-2022/104912/ https://securelist.com/privacy-predictions-2022/104912/#respond Tue, 23 Nov 2021 10:00:20 +0000 https://kasperskycontenthub.com/securelist/?p=104912

We no longer rely on the Internet just for entertainment or chatting with friends. Global connectivity underpins the most basic functions of our society, such as logistics, government services and banking. Consumers connect to businesses via instant messengers and order food delivery instead of going to brick-and-mortar shops, scientific conferences take place on virtual conferencing platforms, and the remote work is the new normal in an increasing number of industries.

All these processes have consequences for privacy. Businesses want better visibility into the online activity of their clients to improve their services, as well as more rigorous know-your-customer procedures to prevent fraud. Governments in many countries push for easier identification of Internet users to fight cybercrime, as well as “traditional” crime coordinated online. Citizens, for their part, are increasingly concerned with surveillance capitalism, a lack of anonymity and dependence on online services.

Reflecting on the previous installment of the privacy predictions, we see that most of them indeed have been big trends this year. Most of all, privacy-preserving technologies were among the most discussed tech topics, even if opinions on some of the implementations, e.g. NeuralHash or Federated Learning of Cohorts, were mixed. Nevertheless, things like on-device sound processing for Siri and Private Compute Core in Android are big steps towards user privacy. We have also seen many new private services, with many privacy-focused companies taking their first steps towards monetization, as well as a bigger push for privacy – both in technology and in marketing – on both iOS and Android. Facebook (now Meta) moved towards more privacy for its users as well, providing end-to-end encrypted backups in WhatsApp and removing the facial recognition system in its entirety from Facebook.
While we hope 2022 will be the last pandemic year, we do not think the privacy trends will reverse. What will be the consequences of these processes? Here, we present some of our ideas about what key forces will shape the privacy landscape in 2022.

  1. BigTech will give people more tools to control their privacy – to an extent.

    As companies have to comply with stricter and more diverse privacy regulations worldwide, they are giving users more tools for controlling their privacy as they use their services. With more knobs and buttons, experienced users might be able to set up their privacy to the extent that suits their needs. As for less computer-savvy folk, do not expect privacy by default: even when legally obliged to provide privacy by default, enterprises whose bottom ine depends on data collection will continue to find loopholes to trick people into choosing less private settings.

  2. Governments are wary of the growing big tech power and data hoarding, which will lead to conflicts – and compromises.

    With governments building their own digital infrastructures to allow both simpler and wider access to government services and, hopefully, more transparency and accountability, as well as deeper insights into the population and more control over it, it is not surprising they will show more interest in the data about their citizens that flows through big commercial ecosystems. This will lead to more regulation, such as privacy laws, data localization laws and more regulation on what data and when are accessible to law enforcement. The Apple CSAM scanning privacy conundrum shows exactly how difficult it can be to find the balance between encryption and user privacy on the one side and pinpointing criminal behavior on the other.

  3. Machine learning is sure great, but we are going to hear more about machine unlearning.

    Modern machine learning often entails training huge neural networks with astounding numbers of parameters (while this is not entirely correct, one can think of these parameters as neurons in the brain), sometimes on the order of billions. Thanks to this, neural networks not only learn simple relationships, but also memorize entire chunks of data, which can lead to leaks of private data and copyrighted materials, or recitations of social biases. Moreover, this leads to an interesting legal question: if a machine learning model was trained using my data, can I, for example, under GDPR, demand to remove all influence that my data had on the model? If the answer is yes, what does it mean for data-driven industries? A simple answer is that a company would have to retrain the model from scratch, which sometimes can be costly. This is why we expect more interesting development, both in technologies that prevent memorization (such as differentially private training) and those that enable researchers to remove data from already trained systems (machine unlearning).

  4. People and regulators will demand more algorithmic transparency.

    Complicated algorithms, such as machine learning, are increasingly used to make decisions about us in various situations, from credit scoring to face recognition to advertising. While some might enjoy the personalization, for others, it may lead to frustrating experiences and discrimination. Imagine an online store that divides its users into more and less valuable based on some obscure LTV (lifetime value) prediction algorithm and provides its more valued customers with live customer support chats while leaving less lucky shoppers to a far-from-perfect chatbot. If you are deemed by a computer to be an inferior customer, would you want to know why? Or, if you are denied a credit card? A mortgage? A kidney transplant? As more industries are touched by algorithms, we expect more discussion and regulations about explaining, contesting and amending decisions made by automated systems, as well as more research into machine learning explainability techniques.

  5. Thanks to work from home, many people will become more privacy-aware – with the help of their employers.

    If you have been working from home due to the pandemic, odds are you have learned lots of new IT slang: virtual desktop infrastructure, one-time password, two-factor security keys and so on – even if you work in banking or online retail. Even when the pandemic is over, the work-from-home culture might persist. With people using the same devices both for work and personal needs, corporate security services would need more security-minded users to protect this bigger perimeter from attacks and leaks. This means more security and privacy trainings – and more people translating these work skills, such as using 2FA, into their personal lives.

To conclude, privacy is no longer a topic for geeks and cypherpunks, and we see how it has become a mainstream topic in the public debate touching on the subjects of personal and human rights, safety and security, and business ethics. We hope that this debate, involving the society, business and governments, will lead to more transparency, accountability, and fair and balanced use of personal data, and that both legal, social and technological solutions to the most pressing privacy issues will be found.

]]>
https://securelist.com/privacy-predictions-2022/104912/feed/ 0 full large medium thumbnail
How we took part in MLSEC and (almost) won https://securelist.com/how-we-took-part-in-mlsec-and-almost-won/104699/ https://securelist.com/how-we-took-part-in-mlsec-and-almost-won/104699/#respond Thu, 28 Oct 2021 14:20:47 +0000 https://kasperskycontenthub.com/securelist/?p=104699

This summer Kaspersky experts took part in the Machine Learning Security Evasion Competition (MLSEC) — a series of trials testing contestants’ ability to create and attack machine learning models. The event is comprised of two main challenges — one for attackers, and the other for defenders. The attacker challenge was split into two tracks — Anti-Malware Evasion and Anti-Phishing Evasion. Even though in our routine work we tend to deal with ML-based protection technologies, this time we decided to step into the attackers’ shoes and chose the offensive tracks. Our Anti-Malware Evasion team consisted of two data scientists — Alexey Antonov and Alexey Kogtenkov — and security expert Maxim Golovkin. In the Anti-Phishing Evasion team there were two more data scientists — Vladislav Tushkanov and Dmitry Evdokimov.

MLSEC.IO Phishing Track

In the phishing track the task was to modify 10 (synthetic) phishing samples (i.e. HTML pages) to convince seven phishing detection models that they were benign. A model was bypassed if it returned a probability of less than 0.1 for each sample. Yet there was a catch: after the modifications the samples had to look the same as before (or to be exact, render screenshots were to have the same hashes for both the original and modified html file). Models were available via an API, meaning that the setting was black-box. To win we had to fool as many models as possible for each sample using the fewest possible number of API queries.

What we did

At first, we thought of attempting a classic model replication attack (see AML.T0005 of MITRE ATLAS), but after we entered the competition we observed that the leader already got the highest possible score using just 343 API calls (with one full upload costing 70 calls). This sort of attack would involve submitting a large number of phishing/non-phishing pages to retrieve model outputs and training “shadow” models of our own, which we would then attempt to bypass offline. However, we did not have the API query budget for that, which probably made the setting a bit more realistic. We, therefore, had to rely upon our domain expertise and pure chance. We started off by submitting several clean (benign) web pages — such as Wikipedia’s — to check what probabilities the models would return. But, while the first three models were quite happy with them, the remaining four consistently generated probabilities way above 0.1. From the API response speed we observed that the models might be ingesting raw HTML without any kind of browser rendering.

Littering the page with invisible text

We composed a simple script to check render hashes and tried our first solution — adding a large hidden chunk of Wikipedia text to the phishing sample, fixing invalid HTML and applying minification. This worked, but only for the first three models. We wondered whether some of the models could be overfitting the provided samples. As we went along, we learned that the last three models often had the same scores, so we only tested our samples on the previous model (likely the most sensitive one) to save API calls.

Obfuscation via byte shift

When working on phishing detection, we, like Rutger Hauer in Blade Runner, had seen things. Phishing payloads hidden in Morse code. Phishing pages consisting almost entirely of screenshots in base64. Actual page text encrypted with ROT13. Drawing on this domain knowledge, we devised our first obfuscation scheme:

  1. Create a small fake “personal blog” page.
  2. Take the actual phishing page, shift all the symbols by n and store as a string.
  3. On page load, shift the symbols back and document.write the result back to the page. However, this turned out to be a bad idea — not only did the shifting process create all kinds of escaping issues, the last three models still sounded alarms on our samples. We added popular header tags, such as <meta>, which somehow led to poorer results on the first three models. It felt like the last three models were unhappy about the large number of HTML tags or, probably, the high-entropy payload string.

Obfuscation via byte integer encoding

We then tried another kind of obfuscation. Instead of shifting, we encoded each byte as an integer number and put the numbers into invisible <img> tags. This fooled the first four models, but not the last three. However, the results were better. We wondered whether or not the last three models reacted strongly to the obfuscator code. Maybe they didn’t like document.write? We submitted an empty page with the obfuscator present and learned that the probabilities were quite low — from 0.074 for the first four models to 0.19 for the remaining three — meaning the phishy JavaScript was not what the models were looking at.

Less tags, more text

We hypothesized that the models somehow took into account character distribution. Considering their probable aversion to HTML tags, we used a big chunk of raw text — an excerpt from the Bible. We added the relevant <meta> to the header and hid chunks of numericized bytes among the verses, like this:

20:18 And he said, Whether [101 49 99 114 117 119 89 79 108 80 78 90 65 83 83 47 56 122 74 74 57 
69 104 108 85 67 105 72 106 108] they be come out for peace, take them alive; 
or whether they [70 50 43 97 107 121 71 68 48 47 104 105 83 86 86 108 107 106 107 48 114 111 49 
114 78 83 49 85 118 75] be come out for war, take them alive.

This worked! Nine pages out of 10 bypassed all seven models, while the 03.html sample was rejected as too large (the maximum size was experimentally determined to be 10 MB). Moreover, all the probabilities were the same:

00 01 02 03 04 05 06
0.02 0.02 0.02 0.02 0.084 0.087 0.087

By that time, we still had about 50 API calls left.

Back to base (64)

This obfuscation scheme was highly inefficient, each byte at least quadrupled. 03.html was a monster of a page, weighing 4 MB due to a few high-res base64 encoded images. We dug into the source code and noticed that some of them were duplicated, so we carved them out. The page slimmed down to 1.7 MB. Alas, to make the obfuscated page less than 10 MB, we were forced to radically increase the numbers-to-text ratio. The last three models sounded alarms, probably because they were suspicious of the unusual character count distribution. However, we learned that if we changed the separator from a space to \n, the sample bypassed them, and the models did at least some kind of processing line by line. In addition to being inefficient, the pages loaded very slowly. So slow in fact that the grading system returned a screenshot equality check failure for 03.html with byte-to-integer obfuscation. We were wary of using base64 at first, as it was commonly used in the original samples, but in the dying hours of the competition, we decided to give it a try and placed base64 chunks among the cloak text:

PGhlYWQ+PG1ldGEgY2hhcnNldD11dGYtOD48bWV0
1:2 And the earth was without form, and void; and darkness was upon
YSBjb250ZW50PSJ3aWR0aD1kZXZpY2Utd2lkdGgs
the face of the deep. And the Spirit of God moved upon the face of the
IGluaXRpYWwtc2NhbGU9MS4wIiBuYW1lPXZpZXdw
waters.
b3J0PjxtZXRhIGNvbnRlbnQ9IlNpdGUgRGVzaWdu

It worked! The page loading time also decreased sharply, and we were able to finally receive the full score for 03.html with the following model outputs:

00 01 02 03 04 05 06
0.017 0.017 0.017 0.017 0.072 0.076 0.076

Thus, we were able to fool the models and took first place in the anti-phishing track.

MLSEC.IO Anti-Malware Track

The anti-malware track consisted of two parts. The defensive part was about creating anti-malware models resistant to adversarial attacks. Note that these models must meet certain quality requirements regarding FPR/TPR on some hidden dataset, so it is impossible to create a model that would simply detect all files. The task was to provide six models for attacking.

The adversarial part was about modifying 50 malicious files in such a way as to bypass as many defense track models as possible (by making them appear benign). At the same time, the modified files were tested by the sandbox contest system. The files were to behave exactly the same way as the original malicious objects.

In the attack competition there was an API allowing us to get model predictions (0 for clean files, 1 for malicious ones). It was also possible to send files for a complete test that included not only a model classification check, but also a sandbox behavior comparison against the originals.

Dataset

The organizers did not post a dataset for the training models (which might have been useful in both the defensive and attacking parts for building and experimenting with models). However, they suggested using the Ember dataset, which has about 200,000 pre-calculated feature vectors for labeled clean and malware files (100,000 for each class), as well as a large number of unlabeled vectors. The dataset offers quite a powerful set of features, including byte distribution histograms in the file and entropy, header signs, file sections, information about the file’s readable strings, and more.

First Experiments

The contest topic strongly overlapped with our recent article about attacks on ML models. So we decided to apply the techniques discussed in the article. First, we trained a neural network model on the Ember dataset using its labeled part (assuming that the defenders would do the same). Then, for each target malicious file, we began to iteratively change certain features (specifically, the byte histograms and the string features) using gradient steps, thereby decreasing the probability of “bad” label prediction by the model. After several steps, a new set of features was obtained. Next we had to create a modified file that would have such features. The modified file could be constructed either by adding new bytes to the end of the file (increasing the overlay) or by adding one or more sections to the end of the file (before the overlay).

Note that this file modification method significantly reduced the probability of files getting classified as malware — not only for the attacked neural network, but also for other completely different architecture models we trained on the same dataset. So the first results of the local testing experiments were quite impressive.

Yet five out of six contest models continued detecting those modified files, just like the originals. The only “deceived” model, as it turned out later, was simply too bad at detecting malicious files and easily confused by almost any modification of the original. There were two possibilities: either the participating models should use none of the features we changed for the attack at all, or heuristics should be used in these models to neutralize the effect of the changes. For example, the basic heuristic proposed by the organizers was to cut off the file’s last sections: this way the added sections effect would be simply ignored.

What features are important for the contest models?

Our further steps:

  1. We tried to find out what features were important for the classifiers. To do this we trained a boosting model on the proposed dataset. Then we proceeded to measure the importance of individual features for target malicious files using Shapley vectors. The picture below shows the features affecting the classification results the most. The color represents the feature value, and the position on the X axis shows whether this value pushes the file into the “clean” or the “malware” zone.

    Feature importance for file classification

    Feature importance for file classification

    For example, the timestamp feature has a significant impact on the classifier. The smaller its value (e.g., older file), the more the file is considered to be “clean”.

  2. From the highest impact features we selected those that can be changed without breaking the executable file. We assumed that the contestants’ models should act similarly to our boosting model, for they depended on the same features.

    During our model research, we found that the header, import table, directory table features are sometimes more important than the file sections data. So if you take a clean file, remove all sections and replace them with sections from the malware file, three out of six models will still consider it “clean”. We also found that one of the models used heuristics to cut off the last sections. If malware sections were added to the end of a clean file, the model’s verdict would be “clean”, but if inserted before the clean ones, the verdict would change to “malware”. Finally, we found the features helped to reliably bypass the four models mentioned. And for the other two, we found no consistent method to generate adversarial files (even not-working ones).

    To completely change the section features with only minor file modification, we discovered this interesting shortcut. To calculate the feature vector, the creators of the Ember dataset used the FeatureHasher class from the sklearn.feature_extraction library. This class turns sequences of pairs (feature name, feature value) into an array of fixed length. First, it defines the position and sign (the sign will be important further on) by the hash of the feature name. Then FeatureHasher adds or subtracts (according to the sign) the corresponding feature value from the array position. The name of the section is used as the key for such hashing, and the value is determined by its size and entropy. Thus, for any given section you can add to the file another one with a specially constructed name, so the features of the new one will fall into the same cell of the hash table — but with opposite signs. Taking this idea further, you could zero out all the values in the hash table or construct any other values by adding a section of desired name and size to the end.

About the quality of the attacked models

We tried feeding various legitimate applications to the contestants’ models, such as Notepad, Paint, 7Zip, Putty, etc., including many Windows system executables. To our surprise, the models would very often recognize them as malicious. For example, the SECRET model, which took first place in the defensive part of the contest, detected most of the clean files we fed to it. Other models, too, kept detecting clean applications.

It might be incorrectly assumed that to win the competition the best protection strategy would be to recognize all files as malicious, except those that are “clean” in the training dataset. In reality such models don’t work. We think this is because the hidden test dataset is not representative enough to assess the quality of the good models. We further believe that the same Ember dataset was likely used by both the contestants and the organizers, so the models overfitted it. In the next iterations of the contest we would suggest expanding the test dataset for the defensive part of the contest with more clean files.

Final algorithm

As a result of our analysis, the following final algorithm was built for how to modify the target malicious files:

  1. Take a clean file not detected by any competing model. In this case, we selected the system file setupcl.exe (remember that non-system clean files were often detected by some models).
  2. Partially replace the malicious file’s header to make it look like that of a clean file (but the file itself should remain functional at the same time).
  3. Using the described section hash trick, zero out the “malware” section features, then add sections from the relevant clean file to the end of the file to add those “clean” features.
  4. Make changes to the directory table, so it looks more like a clean file’s table. This operation is the riskiest one, since the directory table contains the addresses and virtual sizes, the modification of which can make the file inoperable.
  5. Replace the static imports with dynamic ones (as a result, the import table turns empty, making it possible to fool models).

After these modifications (without checking the file behavior in the sandbox) we already had ~180 competition points — enough for second place. However, as you will learn later, we did not manage to modify all the files correctly.

Results

Some modification operations are quite risky in terms of maintaining correct file behavior (especially those with headers, directory tables and imports). Unfortunately, there were technical issues on the contest testing system side, so we had to test the modified files locally. Our test system had some differences, as a result, some of the modified files failed to pass the contest sandbox. As a result, we scored little and took only 6th place overall.

Conclusion

As anti-phishing experts, we were able to deduce, at least in general, how the models worked by observing their outputs and create an obfuscation scheme to fool them. This shows how hard the task of detecting phishing pages actually is, and why real-life production systems do not rely on HTML code alone to block them.

For us as malware experts, it was interesting to dive into some details of the structure of PE files and come up with our own ways to deceive anti-malware models. This experience will help us to improve our own models, making them less vulnerable to adversarial attacks. Also, it is worth mentioning that despite the number of sophisticated academic ML-adversarial techniques nowadays, the simple heuristic approach of modifying malicious objects was the winning tactic in the contest. We tried some of the adversarial ML techniques, but straightforward attacks requiring no knowledge of the model architecture or training dataset were still effective in many cases.

Overall, it was an exciting competition, and we want to thank the organizers for the opportunity to participate and hope to see MLSEC develop further, both technically and ideologically.

]]>
https://securelist.com/how-we-took-part-in-mlsec-and-almost-won/104699/feed/ 0 full large medium thumbnail
Privacy predictions for 2021 https://securelist.com/privacy-predictions-for-2021/100311/ https://securelist.com/privacy-predictions-for-2021/100311/#respond Thu, 28 Jan 2021 10:00:13 +0000 https://kasperskycontenthub.com/securelist/?p=100311

2020 saw an unprecedented increase in the importance and value of digital services and infrastructure. From the rise of remote working and the global shift in consumer habits to huge profits booked by internet entertainers, we are witnessing how overwhelmingly important the connected infrastructure has become for the daily functioning of society.

What does all this mean for privacy? With privacy more often than not being traded for convenience, we believe that for many 2020 has fundamentally changed how much privacy people are willing to sacrifice in exchange for security (especially from the COVID-19 threat) and access to digital services. How are governments and enterprises going to react to this in 2021? Here are some of our thoughts on what the coming year may look like from the privacy perspective, and which diverse and sometimes contrary forces are going to shape it.

  1. Smart health device vendors are going to collect increasingly diverse data – and use it in increasingly diverse ways.
    Heart rate monitors and step counters are already a standard in even the cheapest smart fitness band models. More wearables, however, now come with an oximeter and even an ECG, allowing you to detect possible heart rate issues before they can even cause you any trouble. We think more sensors are on the way, with body temperature among the most likely candidates. And with your body temperature being an actual public health concern nowadays, how long before health officials want to tap into this pool of data? Remember, heart rate and activity tracker data – as well as consumer gene sequencing – has already been used as evidence in a court of law. Add in more smart health devices, such as smart body scales, glucose level monitors, blood pressure monitors and even toothbrushes and you have huge amounts of data that is invaluable for marketers and insurers.
  2. Consumer privacy is going to be a value proposition, and in most cases cost money.
    Public awareness of the perils of unfettered data collection is growing, and the free market is taking notice. Apple has publicly clashed with Facebook claiming it has to protect its users’ privacy, while the latter is wrestling with regulators to implement end-to-end encryption in its messaging apps. People are more and more willing to choose services that have at least a promise of privacy, and even pay for them. Security vendors are promoting privacy awareness, backing it with privacy-oriented products; incumbent privacy-oriented services like DuckDuckGo show they can have a sustainable business model while leaving you in control of your data; and startups like You.com claim you can have a Google-like experience without the Google-like tracking.
  3. Governments are going to be increasingly jealous of big-tech data hoarding – and increasingly active in regulation.
    The data that the big tech companies have on people is a gold mine for governments, democratic and oppressive alike. It can be used in a variety of ways, from using geodata to build more efficient transportation to sifting through cloud photos to fight child abuse and peeking into private conversations to silence dissent. However, private companies are not really keen on sharing it. We have already seen governments around the world oppose companies’ plans to end-to-end encrypt messaging and cloud backups, pass legislation forcing developers to plant backdoors into their software, or voice concerns with DNS-over-HTTPS, as well as more laws regulating cryptocurrency being enacted everywhere, and so on and so forth. But big tech is called big for a reason, and it will be interesting to see how this confrontation develops.
  4. Data companies are going to find ever more creative, and sometimes more intrusive, sources of data to fuel the behavioral analytics machine.
    Some sources of behavioral analytics data are so common we can call them conventional, such as using your recent purchases to recommend new goods or using your income and spending data to calculate credit default risk. But what about using data from your web camera to track your engagement in work meetings and decide on your yearly bonus? Using online tests that you take on social media to determine what kind of ad will make you buy a coffee brewer? The mood of your music playlist to choose the goods to market to you? How often you charge your phone to determine your credit score? We have already seen these scenarios in the wild, but we are expecting the marketers to get even more creative with what some data experts call AI snake oil. The main implication of this is the chilling effect of people having to weigh every move before acting. Imagine knowing that choosing your Cyberpunk 2077 hero’s gender, romance line and play style (stealth or open assault) will somehow influence some unknown factor in your real life down the line. And would it change how you play the game?
  5. Multi-party computations, differential privacy and federated learning are going to become more widely adopted – as well as edge computing.
    It is not all bad news. As companies become more conscious as to what data they actually need and consumers push back against unchecked data collection, more advanced privacy tools are emerging and becoming more widely adopted. From the hardware perspective, we will see more powerful smartphones and more specialized data processing hardware, like Google Coral, Nvidia Jetson, Intel NCS enter the market at affordable prices. This will allow developers to create tools that are capable of doing fancy data processing, such as running neural networks, on-device instead of the cloud, dramatically limiting the amount of data that is transferred from you to the company. From the software standpoint, more companies like Apple, Google and Microsoft are adopting differential privacy techniques to give people strict (in the mathematical sense) privacy guarantees while continuing to make use of data. Federated learning is going to become the go-to method for dealing with data deemed too private for users to share and for companies to store. With more educational and non-commercial initiatives, such as OpenMined, surrounding them, these methods might lead to groundbreaking collaborations and new results in privacy-heavy areas such as healthcare.

We have seen over the last decade, and the last few years in particular, how privacy has become a hot-button issue at the intersection of governmental, corporate and personal interests, and how it has given rise to such different and sometimes even conflicting trends. In more general terms, we hope this year helps us, as a society, to move closer to a balance where the use of data by governments and companies is based on privacy guarantees and respect of individual rights.

]]>
https://securelist.com/privacy-predictions-for-2021/100311/feed/ 0 full large medium thumbnail
Dox, steal, reveal. Where does your personal data end up? https://securelist.com/dox-steal-reveal/99577/ https://securelist.com/dox-steal-reveal/99577/#respond Tue, 01 Dec 2020 09:00:21 +0000 https://kasperskycontenthub.com/securelist/?p=99577

The technological shift that we have been experiencing for the last few decades is astounding, not least because of its social implications. Every year the online and offline spheres have become more and more connected and are now completely intertwined, leading to online actions having real consequences in the physical realm — both good and bad.

One of the most affected areas in this regard is communication and sharing of information, especially personal. Posting something on the internet is not like speaking to a select club of like-minded tech enthusiasts anymore — it is more akin to shouting on a crowded square. This gives rise to many unique threats, from cyberbullying and simple financial scams to spear phishing and social engineering attacks on business executives and government officials. And while awareness of privacy issues is increasing, much of the general public still only have a basic understanding of why privacy matters.

Unfortunately, even if we take good care of how and with whom we share our personal data, we are not immune from being doxed. The abusers may be motivated enough to go beyond gathering data available in the public domain and turn to the black market in the hope of finding personal information that will do real harm, for instance, gaining access to social media accounts. In this report, we will dig deeper into two major consequences of (willing and unwilling) sharing personal data in public — doxing (the public de-anonymization of a person online) and the selling of personal data on the dark web — and try to untangle the connection between the two. We’ll’ also look at how these phenomena affect our lives and what challenges these conditions present to users.

Key findings

  • Doxing is not something that only vulnerable groups or people with specific professions, such as journalists or sex workers, can be affected by. Anyone who voices an opinion online may potentially become a victim of doxing.
  • Almost any public data can be abused for the purpose of doxing or cyberbullying with an unlimited number of ways users can be harmed by their own data.
  • With increasing digitization of most aspects of our everyday lives, even more data is now shared with organizations and may end up in the hands of criminals. This now includes personal medical records and selfies with personal identification documents.
  • Access to personal data can start from as low as US$0.50 for an ID, depending on the depth and breadth of the data offered.
  • Some personal information is as much in demand as it was almost a decade ago – primarily credit card data and access to banking and e-payment services. The cost of this type of data has not fallen over time and that is unlikely to change.
  • Data sold on dark market websites can be used for extortion, executing scams and phishing schemes and direct stealing of money. Certain types of data, such as access to personal accounts or password databases, can be abused not just for financial gain but also for reputational harm and other types of social damage including doxing.

Unwanted spotlight: doxing

The increasing political and social division of recent years combined with a perceived anonymity exacerbates some of the corresponding social dangers on the internet, such as trolling and cyberbullying. And at the intersection with privacy threats there is the issue of doxing.

What is doxing?

Historically, doxing (also spelled doxxing) meant de-anonymizing a person on the internet, especially in early hacker culture, where people preferred to be known by their nickname (online handle). The meaning has evolved, however, to have a broader sense.

Doxing is, in a way, a method of cyberbullying. It occurs when a person shares some private information about another person without their consent to embarrass, hurt or otherwise put the target in danger. This can include sharing:

  • embarrassing photos or videos;
  • parts of private correspondence, probably taken out of context;
  • a person’s physical address, phone number, private email address and other private contacts;
  • occupation and job details;
  • medical or financial data, criminal records.

EXAMPLE: An example of a threat of doxing in this classic sense is the story of the anonymous blogger Slate Star Codex, who claimed that a New York Times journalist insisted on publishing his real name in a piece about him. This prompted the blogger to delete his blog. Luckily, the newspaper seems to have abandoned the idea.

Doxing also includes cases when data about the victim is already publicly available, but a perpetrator gathers and compiles it into an actionable “report”, especially if also inciting other people to act on it. For example, if a person voices an opinion on a divisive issue in a post on a social network, throwing in their phone number in the comments and suggesting people should call them at night with threats is still doxing, even if the phone number is available online in some database or on some other social network.

EXAMPLE: A journalist from Pitchfork, a US music outlet, received numerous threats on Twitter including suggestions to “burn her house” after her phone number and home address were published by Taylor Swift fans who were unhappy about a review of the singer’s latest album that wasn’t positive enough.

Why is doxing dangerous?

Compared to the physical world, information on the internet can spread very quickly and is almost impossible to remove once posted online. This makes doxing even more harmful.

The most common goal of doxing is to cause a feeling of stress, fear, embarrassment and helplessness. If you are caught in a heated argument on Twitter, and somebody posts your home address and suggests that people should hurt you, it naturally causes anxiety. Threats can also be directed at your relatives. The real danger, however, comes if someone decides to actually carry out the threats, which means doxing also threatens your physical safety — something that happens more often than you would think.

Besides posting your information online for everyone to see, attackers can share it in a targeted way with your relatives, friends or employer, especially if it is embarrassing. This can harm the victim’s relationships with their loved ones, as well as their career prospects.

EXAMPLE: After a service for finding people using only a photo gained popularity in VK, a popular social network in Russia, it was used to de-anonymize women who starred in porn movies or worked in the sex industry. The perpetrators specifically suggested sharing this information with their relatives. One of the victims was a school teacher who eventually lost her job.

Doxing scenarios

How can you be doxed? These are some common scenarios and how they can harm the victim:

  • Identifying the user and sharing information directly with their employer, which results in the person getting fired due to social pressure;
  • Leaking intimate photo and video content of a user to the public — an activity that is often called ‘revenge porn’ is a widespread method of attacking one’s privacy with malicious intent that can have significant consequences for the victim;
  • Revealing the identity of anonymous bloggers, internet users, opinion leaders and creators, which can lead to real danger if the victim is in a hostile environment, for instance, opposition bloggers in certain countries or a person who supports unconventional views;
  • Outing the person and providing their personal details in the media when this information does not serve the public interest and may directly harm the person;
  • Gathering and sharing information about the account of a specific person (the potential victim) featured in sensitive or questionable content with hostile groups or accounts that may engage in online or even offline violence against that person.

Social impact

Doxing is a very pressing matter in times of increasing social and political division. Doxing, as well as the threat of doxing, hampers freedom of speech and produces a chilling effect. It prevents people from voicing their opinions, which is harmful to democracy and healthy social debate.

Some people are more likely to be victims of doxing. Journalists, bloggers, activists, lawyers, sex industry workers, law enforcement officers all run a higher risk of being doxed. For women, doxing goes hand in hand with sexualized verbal abuse and threats. For law enforcement officers, it also means direct danger for their physical safety, especially for undercover officers. This can lead people to abandon their jobs.

High-profile internet personas are more at risk than average users. It doesn’t mean that “ordinary” people are safe from doxing. Having said something online or done something on camera that “upsets” a large group of people can randomly attract excessive attention from online crowds and turn your life into a nightmare — even if you never actually said or did it in the first place.

EXAMPLE: Tuhina Singh, a CEO of a Singapore company was doxed: her phone number and private email address were published online, resulting in insults and threats. Reason? She was misidentified as a woman from a viral video, refusing to put on a mask amidst the coronavirus pandemic.

The darknet database. How much do you cost?

Doxing is the result of abusing information that is available in the public domain and not used for financial gain. However, the threats to personal data and, hence, personal safety, do not end there. Aside from the publicly available data that we freely share and that can be gathered by anyone and used for malicious purposes, the organizations we share our data with don’t always handle it responsibly.

By definition, we do not expect this information to leak out into the public and even if it does, do not anticipate that it might harm us. According to recent research by Kaspersky, 37% of millennials think they are too boring to be the victim of cybercrime. The number of massive data leaks hit a new high this year and we no longer get surprised by yet another company being hacked and their customers’ data being leaked or used in ransom demands.

Efforts to better protect personal data are being made in a variety of countries, with governments imposing new directives to ensure protection and penalize irresponsible management of citizens’ data. New personal information protection directives such as the General Data Protection Regulation (GDPR) in the EU and Brazilian General Data Protection Act (LGPD), as well as increasing customer scrutiny towards data handling practices, have forced organizations to improve their security and take the data leakage threat more seriously.

However, that doesn’t mean the data is safe. In some cases, stolen data is used for ransom practices, in others it is published out in the open. Sometimes it is a mix of both: threat actors who employed the Maze ransomware published stolen data if they did not succeed in getting the ransom money. But most of it ends up on the dark web as a commodity, and a very accessible one. Darknet forums and markets, essentially marketplaces for illegal physical and digital goods, are used by cybercriminals to sell services and products, from malware to personal data.

Our experts, who specialize in understanding what goes on in the dark web, researched the current state of data as a commodity on such platforms to find out what kind of personal data is in demand, what it is used for and how much it costs.

Methodology

For the purposes of this research we analyzed active offers on 10 international darknet forums and marketplaces that operate in English or Russian. The sample included posts that were shared during the third quarter of 2020 and that are still relevant.

Research findings – how much do you cost?

Covering all types of personal data offers on the dark web would turn this report into a short book, so it focuses on just some of the most popular categories available on dark markets. However, it is important to mention that the types of databases leaked and then sold on the dark web vary, which is unsurprising considering they are stolen from different institutions and organizations. Leaked databases can be cross-referenced and this way made even more valuable, as they present a fuller picture of a subject’s personal details. With that in mind, let’s dig into what is out there in the shadows that cybercrooks might have on you:

ID card data: $0.5 – $10

Identity documents or ID cards are the main means of identification in most countries, including the US and throughout Europe. Usually they are tied to the most important services, especially state services and contain sensitive information such as social security number (SSN) in the US. Though important, the cost of these documents on the black market is not that high and depends on how full the information is. For instance, information with a full name and insurance number will cost as little as $0.50 per person, while the price for a ‘full pack’ including ID number, full name, SSN, date of birth (DOB), email and mobile phone can reach up to $10 per person. The price also varies depending on the size of the purchase – data sold in bulk is cheaper per unit.

Purchasing 150 ID cards will cost as little as 50 cents per unit

Data from identity documents can be used for a variety of scams, filling out applications for specific services and gaining access to other sensitive information that can later be used for criminal purposes.

Sometimes the leaked databases contain much more than just ID info

Passport scans: $6 – $15

Passports are another type of identification document that are popular with cybercriminals. In some countries such as Russia, Ukraine and other former Soviet states they are used instead of IDs and are required to receive pretty much any government-related or financial service – from filing a complaint in a shop to taking out a loan. In other countries, passports can also be used for identification on international platforms such as cryptocurrency exchanges – or for international fraud.

This is the reason passport scans go around the web quite often – think of how many times you have uploaded a copy of your passport to some service, sent it to an organization or allowed them to scan it themselves.

Passport scans are more expensive than identification details with prices varying from $6 to $15 depending on the quality of the scan and the country of origin. Typically, there are two types of passport scans sold – a scan of the first page which, understandably, is cheaper than a scan of a full passport.

Passport copies for purchase can be selected by gender if required

Driver’s license scans: $5 – $25

Driver’s licenses are another type of identification document that is in demand in the shadows, primarily due to the growing number of services that one can register for using a driver’s ID. Typically, the information sold on forums includes a scan of the license with full information. Varying in price from $5 to $25, these can be used by cybercriminals for car rental, as an ID for local services or insurance fraud.

Selfie with documents: $40 – $60

Have you ever taken a selfie with your passport or ID? User identification is taken more seriously these days with organizations adhering to so-called know-your-customer (KYC) programs that require identity verification for various operations. For instance, cryptocurrency exchanges employ this practice to prevent money laundering by getting people extracting funds to confirm they are who they say they are. Social networks require selfies with documents when users need to recover access to their account and bank employees take pictures like these when delivering credit cards to clients’ homes.

Using stolen passport or ID selfies allows fraudsters to bypass KYC guidelines and continue to launder money. These documents can also be used for a whole variety of services – from car rentals to getting micro-loans or manipulating insurance companies. Such documents allow cybercriminals to enter the cache or execute their schemes, and even blackmail the people identified in these documents. As a result, this data is very valuable, varying from $40 to $60 per person.

Selfies with identification documents can be used to bypass a service’s security procedures

Medical records – $1 – $30

The world is becoming more digitized along with certain aspects of our lives that we never thought would go online. Take medical records, for instance – cybercriminals have laid their hands on them too. Looking back to 2012, when we analyzed different types of data available in the darknet, medical data was not even a thing. Now, however, this data is in demand as it can be used for a whole variety of fraudulent activities from obtaining health insurance services to purchasing regulated drugs. It can also be used to demand ransom. Recently, for instance, Vastaamo, a Finnish mental health organization, was hacked and the data of their patients, including children, stolen and later exposed on dark web markets, with at least two thousand patients affected. The hackers initially demanded a ransom payment to delete the information, but then switched their attention directly to the patients.

Leaks of medical information can become very unpleasant as they put the victims who are already vulnerable into an even more vulnerable position. The type of data shared on the darknet varies from a medical form with a full name, email, insurance number, and the name of the medical organization to a full medical record of a patient – with their medical history, prescriptions and more.

Medical records sold on darknet forums can vary from full information about a person to forms from medical institutions

Credit card details: $6 – $20

Credit card details fall under the category of most basic information stolen and used by cybercriminals. Full credit card information including the name, number and CVV code can be used to withdraw funds or purchase goods online and is valued from $6 to $20 per unit. Back in 2012, our evaluation put the price at $10. On average, the cost is more or less the same. The price for such data is dictated by the country of origin, the bank and more importantly, on how large the purchase is, with ‘better’ value with larger volume purchases. Of course, new anti-fraud banking systems are making life harder for cybercriminals, forcing them to constantly come up with new ways to cash out. Nevertheless, with credit card details being the starting point for most of these schemes, they are nowhere near becoming outdated.

Online banking and PayPal accounts: $50 – $500

Another type of financial data is online banking access and PayPal account information. Both provide direct access to the funds of the victims with PayPal being a sweet spot for the cybercriminals who want to launder their money and withdraw it without any security checks. Access to online banking is generally valued at between one and 10 percent of the funds available in the account, while PayPal accounts cost from $50 to $500 depending on the available credit and previous user operations.

Subscription services: $0.50 – $8

In the world of subscription-based entertainment, access to popular streaming, gaming or content platforms is in high demand. While little personal information is given away, losing access to one’s account on Netflix, Twitch or PornHub is not something that anyone would enjoy. Stolen subscription service credentials are not only sold on only in the dark web – they can be found in some shady regular forums too. The dark web usually has offers to purchase access details in bulk, which can later be sold individually to multiple customers. The price for access to such services varies from 50 cents to $8.

How much does your data cost?
Credit card details: $6 – $20
Driver’s license scans: $5 – $25
Passport scans: $6 – $15
Subscription services: $0.50 – $8
ID (full name, SSN, DOB, email, mobile): $0.50 – $10
Selfie with documents (passport, driver’s license): $40 – $60
Medical records: $1 – $30
Online banking account: 1-10% of value
PayPal accounts: $50 – $500

Password databases

Leaks of password databases are among the most widespread data leaks. From retail loyalty cards to logins for banks, such databases have been appearing on the dark web and even on the normal web for years, and they have a tendency to be leaked into the public domain, requiring very small payments to access them, or sometimes access to the data is entirely free. While these databases are outdated for the most part, they still represent a real danger. Users tend to use the same passwords across a number of platforms and accounts, often tying them all to the same email. Picking up the right password for a specific account is often a matter of time and effort, and as a result, users are at risk of having more of their data compromised – from their social network accounts to their personal email or private accounts on adult websites. Access to other accounts can later be resold (as in the case of subscription services) or used for blackmail or scam.

Certain services aggregate leaked passwords and enable paid subscription-based or single time access to their databases as shown on the screenshot. The service on the screenshot allows one database check for 30 coins

Unauthorized access to email and social media: $400 – $800

With so much personal data available for sale, one may wonder whether purchasing access to someone’s email or social network account is also as easy as obtaining IDs. The darknet operators do indeed offer to hack into specific accounts or emails, charging from $400 per account. However, the methods employed by those who offer such services are rather basic – they can only go as far as trying to guess the password or checking the account against existing leaked databases or executing social engineering attacks to get the user to reveal their password. Increased security of social media and email services has also made these practices less successful with double authentication and various other security measures protecting users better than before. As a result, most of these kinds of offers on dark markets are, ironically enough, actually scams against other cybercriminals.

Examples of forum advertisements offering to hack social media accounts and emails

Nevertheless, this doesn’t rule out the possibility of real targeted account hacking – more covert, technically complex methods are employed by experienced actors and these services usually cost a lot. For instance, the actor may identify a specific email of the potential victim, send a spear-phishing email prompting the target to download malware that will be able to collect information about the passwords and ultimately gain access to the targeted accounts. These services, however, are extremely expensive, time consuming and are usually executed by advanced threat actors against priority targets.

Key takeaways

In the course of this research we have witnessed a number of trends relating to stolen and repurposed personal data. Some personal information remains as much in demand as it was almost a decade ago – primarily credit card data and access to banking and e-payment services. The cost of this type of data has not fallen over time and that is unlikely to change.

Another big change is the type of data now available for sale. With the digitization of medical institutions, personal medical records traditionally categorized as very sensitive information became available for the public and cybercriminals to use and abuse for financial gain. The current development and spread of telemedicine in the world is unlikely to decrease this trend, although, we hope that after recent cases such as the Vastaamo hack, medical organizations will approach data collection and security with greater responsibility.

The growth in the number of photos of people with documents in their hand and various schemes exploiting them also reflects a trend in the cybergoods game and indicates that any data shared, even with organizations, can potentially end up in the hands of cybercrooks and abused for the purposes of financial gain. The repercussions of such data abuse are very real for the victims as they will have to deal with the loans taken out in their name or services used on the basis of their identity.

At the same time, there is some good news when it comes to the safety of personal accounts and gaining access to specific emails and social media accounts. With improved security measures employed at the industry level, targeting and hijacking a specific individual’s account is very costly, and in most cases, not doable. In this sphere there is evidence of an interesting dynamic of cybercriminals scamming each other, with most cybercrooks unable to deliver what they advertise. That does not, however, eliminate the threat entirely: provided they have the funds and their order is big enough, the criminals may still be able to buy what they want.

The overview of the types of data available on dark markets suggests that at least some of the offers might be of interest to especially determined doxers. While we believe that such cases are unlikely due to their cost right now, things might change depending largely on the determination of the abusers to dox an individual.

Protecting your data and yourself

With our ever-growing online presence and footprint, it is almost impossible to be completely anonymous online. A determined person with some computer skills, especially if they have access to privileged information (say, a private investigator or a law enforcement officer), will find at least some data about you given enough time.

EXAMPLE: Kevin Mitnick shares a story in his book “The Art of Invisibility” about how he managed to find out the SSN, city of birth and the mother’s maiden name of a reporter who thought she had a very minimal online presence (he did it with her consent). To do so, he used his access to a specialized web resource for private investigators. People who usually enjoy privacy can also be tracked to their homes if you have specific data: a geolocation dataset from a marketing company obtained by the New York Times in 2019 showed the GPS location over time of senior US government officials, policemen and even acquaintances of Johnny Depp and Arnold Schwarzenegger. 

This means that online privacy is almost always about assessing the risks that you face and taking appropriate measures to mitigate them. If you think you might anger a few low-caliber online trolls with a tweet, it is enough to hide your email address from your social network profile. If you are a political reporter covering extremist movements, you need more control of your digital footprint. Below, we describe a few basic steps that will be sufficient against doxing for an average internet user.

Know what they know

The first thing to do if you want to protect yourself against doxing is to research what the Internet knows about you. Try googling your name, combine it with some other data about you such as your place of residence or year of birth to narrow down the results. Try searching for your online handles and emails as well. If your name is not very popular, you can even subscribe to notificationы from Google in case it pops up somewhere on the web.

Apart from Google, there are so-called people search engines such as BeenVerified that allow background checks to be conducted on people using open web data or government records. Publishing this kind of information online might be illegal depending on the country, so availability of such websites differs in various jurisdictions.

If you have public social media profiles, review the posts. Check if they contain geotags with places that you frequent, such as your home or office, or photos that can reveal their location. Of course, not all photos are dangerous, but the more specific they are, the more risk they carry. Scout your older posts for some more private data, such as names of your family members. If you have a private profile, check if you actually know all the people in your friend list.

Remember that, besides social networks per se, there are many other applications that have a social component and can reveal information about you, ranging from languages that you learn to your level of sexual activity. Pay special attention to apps that record some sort of geodata, such as fitness tracking applications. Check that your account in such apps is private.

EXAMPLE: In 2018, a security researcher noted that there were spots with a high level of activity in a dataset of user activity in Strava, a fitness app, in the Middle East. These spots, cross-referenced with Google Maps, gave away the location of US military bases in the region.

Finally, check if your data was leaked in data breaches. Leak monitoring is usually built into password managers and web browsers, but you can also use a dedicated service like HaveIBeenPwned. If your email is found in a leak, you can assume that any other information from the breached info is available somewhere (e.g. your home address if the breached service is a web store, or your favorite running routes if it is a fitness tracker).

Remove what you can

If you think that the information about you on the internet can be used against you, try to get rid of it. In the case of social networks, it is usually relatively easy: you either remove the posts with private data or make your profile private.

With other websites, check if you can just remove or disable your account. Otherwise, check if the website has a complaint or information removal form and use it. If not, try to contact the administrators directly. If your jurisdiction has strict data privacy laws, such as GDPR or CCPA, it is easier for a service to just remove your data than face a regulator and the threat of huge fines.

If some information is impossible to remove from the source, you can ask the search engine to remove links to websites containing your private data from search results by exercising the so-called right to be forgotten. Whether you can do so depends on the search engine and jurisdiction.

EXAMPLE: One of our researchers uses a smart watch with an application that traces his physical activity and helps him monitor his progress when jogging. One day he was approached by another runner that he didn’t know. Turns out the guy knew his name and where he runs all thanks to this application, which did not only tracked his data but also shared it in its internal social network. While this strange occurrence didn’t result in any harm and the intention of the application was to help fellow runners meet each other, it is clear how knowledge of someone’s location and regular jogging route could be used against them possibly by less friendly strangers.

Protect yourself

Doxing is most devastating when the data being published is private, i.e., cannot be found on the internet. An adversary can obtain this data by hacking into the accounts and services that you use. To minimize the risks of being hacked, follow these simple rules:

  • Never reuse your passwords across accounts. Use a unique password for each account and a password manager to store them.
  • Protect your devices with fingerprint/face scan or with a PIN or password.
  • Use two-factor authentication. Remember that using an application that generates one-time codes is more secure than receiving the second factor via SMS. If you need additional security, invest in a hardware 2FA key.
  • Beware of phishing email and websites.

If you are ready to invest a bit more effort into protecting your privacy, here are some additional ways to protect your personal information or check if your passwords or data have become compromised without your knowledge:

  • Think twice before you post on social media channels. Could there be unforseen consequences of making your views or information public? Could content be used against you or to your detriment now or in the future?
  • To make sure people close to home, including family, friends or colleagues, can’t access your devices or accounts without your consent, never share passwords even if it seems like a good idea or convenient to do so. Writing them on a sticky note next to your screen might be helpful for you, but it may also help others to access things you don’t want them to.
  • Ensure you always check permission settings on the apps you use, to minimize the likelihood of your data being shared or stored by third parties – and beyond – without your knowledge. You might end up giving consent by default, so it is always worth double-checking before you start using an app or service.
  • There is no substitute for strong and robust passwords. Use a reliable security solution like Kaspersky Password Manager to generate and secure unique passwords for every account, and resist the temptation to re-use the same one over and over again.
  • Password managers also allow personal data to be stored in an encrypted private vault where you can upload your driver’s licenses, passports/IDs, bank cards, insurance files and other valuable documents and manage them securely.
  • To find out if any of the passwords you use to access your online accounts have been compromised, use a tool such as Kaspersky Security Cloud. Its Account Check feature allows users to inspect their accounts for potential data leaks. If a leak is detected, Kaspersky Security Cloud provides information about the categories of data that may be publicly accessible so that the individual affected can take appropriate action.

When it is too late

If you have fallen victim to doxing, you can try to contact the moderators of the website where your data was leaked or flag the posts with your data on the social network to have it removed before the information spreads.

Note that usually the goal of doxers is to cause the victim stress and psychological discomfort. Do not engage with trolls, make your accounts private and seek comfort with your friends, relatives and offline activities. It takes a short time for an online mob to give up on their victim and move on if you do not give them additional reasons to attack you.

However, if you receive threats or fear for your physical safety, you might want to contact law enforcement. In this case, remember to document what is going on, for example screenshot the threats, to provide law enforcement officers with additional evidence.

To sum up: take good care of yourself and your data

The digital world provides us with endless opportunities to express our individuality and share our stories, but we need to make sure it is a safe place to express ourselves. As this research shows, our data is valuable not only to us but to many other users with malicious intentions – ranging from an expression of dissatisfaction with your actions to cybercriminals who thrive on profiting off personal data. That’s why it’s crucial to know how to protect it.

An important point to remember here is that cybercriminals are not the only ones who can use our data to cause harm – with new phenomena such as doxing, users need to be aware that they can never know how someone can capitalize on their data. Approaching personal data sharing with responsibility is a must-have skill nowadays that will help keep us safer in the storms of the digital world.

]]>
https://securelist.com/dox-steal-reveal/99577/feed/ 0 full large medium thumbnail