A peek into Reddit's anti-spam internals

2026-06-27

¦ reddit

5 years ago, back when I still used Reddit, something unusual happened. My app of choice, Relay for reddit, was bombarding me with a bunch of weird notifications about removed spam.

Getting these notifications wasn’t unusual in and of itself - I was a moderator of a few fairly small subreddits that’d from time to time get posts automatically removed for spam. However, when I went to actually look at the removed spam, I saw something I was never meant to see.

I saw Reddit’s anti-spam internals.

so that's about it.

Removed: spamurai (*Removing potential spam content from unproved user*: comment `t1_pupp13` (0.7294469 perspective spam) by u/GoodBoyBacon (0.06 days old, spammy: 11, hosted: false, -1 karma, 4 reports, org: `ComcastCable`, email: gmail.com) in r/GoodBoysOnly (guest) posting nil from `oauth.reddit.com` via `nil` from UA: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.30, RHS: oc:ac:kT:lw:bV:aX:af:a6:l5:y3:aT:m9:pt:f3:hZ:az:aR:aQ, LANG: en-US,en,q=0.9, TLS: j7bXVc3l/qer8FRj2aEiqOrx1ro=DDZ0TViWlY5HYgOPw1SZqDxwiO8= - referrer: https://www.reddit.com/, thumbnail: `` - ) • GoodBoyBacon • 1 points • 27 min

You see u/BadGuy67? He's the same guy as https://www.reddit.com/r/ReallyBadGuys/comments/qw3rt1/if_ur_a_bad_guy_post_here_please/

Removed: Reddit (shadowban applied on 10-27-2021) • GoodBoyBacon • 0 points • 1 hr

I'm not the same guy as that other guy please read my comment

How Reddit moderation works

So Reddit is a site comprising of smaller sub-communities, which are called subreddits. For example, /r/mylittlepony is a subreddit for fans of My Little Pony. These subreddits can be created by anyone, and they are moderated by a group of community moderators appointed by the creator of the subreddit.

If we go¹ on /r/mylittlepony we can see the list of moderators on the sidebar:

MODERATORS

MESSAGE THE MODS
message sent.
Orschmann
optimistic_outcome
Chinch335
IllusionOf_Integrity
spokesthebrony
TheeLinker
Lankygit
Raging_Mouse
Searchbar_Trixie
gbeaudette
...and ...and

These moderators can remove posts, ban users, manage modmail etc, but they are just normal Reddit users.

If you’re a moderator you can see who removed a post or comment:

Removed: rebane2001 • ExampleUser • 1 points • 1 hr

I'm breaking the rules 😈

This includes the automod - a rules-based moderation system:

Removed: AutoModerator • ExampleUser • 1 points • 1 hr

bad word

But then you’ll sometimes also see the mysterious “Auto”:

Removed: Auto • ExampleUser • 1 points • 1 hr

This is what happens when something gets caught in Reddit’s mysterious spam filters, or when Reddit’s sitewide admins remove something manually.

In the moderator log, they’ll show up as “reddit” and “Anti-Evil Operations”:

ExampleSubreddit: moderation log

filter by action: filter by moderator:

You cannot be reasoned with.

13 days ago	Anti-Evil Operations	removed link "[ Removed by Reddit ]" by EvilPoster
27 days ago	Anti-Evil Operations	removed link "[ Removed by Reddit ]" by Rule_Breaker1337
1 month ago	reddit	removed link "buy my shirt" by xXx_ShirtSeller_xXx
1 month ago	reddit	removed link "sexy ladies" by SpamBot_00018341
2 months ago	Anti-Evil Operations	removed link "[ Removed by Reddit ]" by PuppyGirlHater

These sitewide spam removals is what the rest of this post is going to be about.

Oopsie

What happened to me back in 2021 was that due to some kind of an error on Reddit’s side, the usual Removed: Auto text had been replaced with the actual removal reason. Why this happened to me I do not know - it returned back to normal after an hour or so. All I was left with was a bunch of screenshots I managed to take while this stuff was still going on.

But that doesn’t mean we can’t speculate!

Up until 2017, Reddit’s source code was publicly available. Of course, a lot has changed since then, but we can still analyze the archived code and hypothesize what might be happening.

The function responsible for moderator removals is POST_remove:

def POST_remove(self, thing, spam):
    """Remove a link, comment, or modmail message."""
    ...
    admintools.spam(thing, auto=False,
                    moderator_banned=not c.user_is_admin,
                    banner=c.user.name,
                    train_spam=train_spam)

We can see it calls admintools.spam with a few arguments, notably: moderator_banned, which marks whether something was removed by a moderator or an admin, and banner, which notes down the username of whoever did the ban action.

Poking around a bit more, we find the get_mod_attributes function:

# Comments added by me for the blogpost
def get_mod_attributes(item):
    data = {}
    # If user is logged in and a moderator
    if c.user_is_loggedin and item.can_ban:
        data["num_reports"] = item.reported
        data["report_reasons"] = Report.get_reasons(item)

        ban_info = getattr(item, "ban_info", {})
        # If post was removed
        if item._spam:
            data["approved_by"] = None
            # If post was removed by a mod
            if ban_info.get('moderator_banned'):
                # Show the banner name
                data["banned_by"] = ban_info.get("banner")
            else: # else, if post was removed by an admin
                # Hide the banner name
                data["banned_by"] = True
        else:
            data["approved_by"] = ban_info.get("unbanner")
            data["banned_by"] = None
    else:
        data["num_reports"] = None
        data["report_reasons"] = None
        data["approved_by"] = None
        data["banned_by"] = None
    return data

This is the part of the API that actually returns us the information about removals - the banner in ban_info is the red text I was seeing Relay. And it seems like it will only get returned if the removal was by a moderator, not an admin. But where does that Auto text come from? Reddit’s API only returns an actual username, or True.

Turns out that it’s actually coming from Relay² itself:

// reddit/news/oauth/reddit/model/base/RedditLinkComment.java
if (this.bannedBy.equalsIgnoreCase("true")) {
    this.bannedBy = "Auto";
} else if (this.bannedBy.equalsIgnoreCase("null")) {
    this.bannedBy = "";
}

Okay, that explains that. But where am I getting these internal messages from?

Well, it seems like Reddit is re-using the banner field for internal removal reasons:

def POST_submit(self, form, jquery, url, selftext, kind, title,
                sr, extension, sendreplies, resubmit):
    """Submit a link to a subreddit."""
    ...
    if not is_self:
        ban = is_banned_domain(url)
        if ban:
            g.stats.simple_event('spam.domainban.link_url')
            admintools.spam(l, banner = "domain (%s)" % ban.banmsg)
            hooks.get_hook('banned_domain.submit').call(item=l, url=url,
                                                        ban=ban)

The above code snippet runs whenever a new link is posted. It checks whether the domain is spam, and if it is it removes it with the banner set to “domain (REASON)”.

We can see it in action with this removed post for example:

I_EAT_PONIES in MyLittleOutOfContext

Conga!

Removed: domain (banned as an experiment to see what happens with tubmlr spam ring. - em 5/31/12) • 0 Comments • 24.media.tumblr.com • 9 yrs

Seems like em was playing around with auto-removing all tubmlr [sic] links on Reddit in 2012?

Anyways, it seems like Reddit is stuffing its internal spam removal reasons in the banner field, but making it so that only sitewide admins can see them. And something in a codepath similar to get_mod_attributes was broken for a couple hours, allowing me to see those reasons.

Let’s take a look at the kinds of reasons I managed to get a glimpse of!

domain (2012 - present)

The first category is the domain removals, as shown earlier. Nearly all of these are just Removed: domain (spam), though I did find this gem in there:

presafur in MyLittleOutOfContext

Just register and look for me here* h9OI5WUQZPL

Removed: domain (le sexxxxy sex spam) • 0 Comments • www.example.com • 5 yrs

Perhaps I’m just childish, but I find the idea of someone going le sexxxxy sex spam while working on a spamfilter rather amusing.

Reddit probably had some issues with Tumblr spam, because in addition to the tubmlr removal we saw earlier there was also this:

JackofH3art in MyLittleOutOfContext

It hurts so good.

Removed: domain (ban - 11/12/12 mg ) • NSFW • 0 Comments • bartl3by.tumblr.com • 8 yrs

I’m quite certain that this removal was targeted at Tumblr in general, and not the specific blog linked, since bartl3by.tumblr.com seems to be a legitimate (although somewhat perverse) blog.

I believe domain removals are the only type of anti-spam we can actually see in the public Reddit source code. Though, even that is partially hidden.

spammit (2012 - present)

The next category is spammit, which somehow analyses a post and gives it a percentage rating:

Kyderra in MyLittleOutOfContext

I'm very fondling of you

Removed: spammit(72.98% spammy) • 0 Comments • dashie.mylittlefacewhen.com • 8 yrs

Yes, there’s no space between spammit and the parenthesis.

The percentages of removed posts were generally fairly high, with the lowest one being 39.71% spammy and highest one 98.19%.

That being said, spammit doesn’t seem like a very accurate anti-spam measure for the subs I moderate because it seemed to hit a lot of legitimate Imgur posts with a 70-98% spammy rating.

bans (2016 - present)

Next, we have post removals for banned users.

kaitlynwwrettin in MyLittleOutOfContext

Cost Reduction & Cost Saving Consultants | Puppygirl Consulting

Removed: banned user • 0 Comments • www.example.com • 3 yrs

Some of them are marked with just a Removed: banned user, though others get a fancy Removed: Reddit (banall performed).

KerryVinebt403 in MyLittleOutOfContext

casino online

Removed: Reddit (banall performed) • 0 Comments • example.com • 3 yrs

The posts I saw being removed like this were all very obvious spam. Mostly just ads for all kinds of services. I suspect this is the admins seeing an obvious spambot and just nuking it from orbit.

shadowbans (2016 - present)

It’s known that Reddit shadowbans its users. A shadowban is a silent ban where seemingly nothing happens to your account and you’re still able to post/comment, but nobody else will be able to see your posts and comments. In fact, there’s even a subreddit for checking whether you’re shadowbanned.

But now we can actually see what a shadowban looks like to admins:

pickertramontana in trixiemasterrace

Blonde Teen Takes A Massive Meow In Her Bark

Removed: Reddit (shadowban applied on 11-06-2019) • 0 Comments • self.trixiemasterrace • 1 yr

I’m not going to share the specific conversation here, but there was a really funny comments thread going on where a person was blaming mods for removing all of their comments while in reality being shadowbanned by Reddit.

spamurai (2020 - present)

Now we get to the most interesting part of the entire spam filter thing. Unlike spammit, spamurai is a system that does have some public references to it. According to slide #28, Reddit uses Minsky for “ML”, and Spamurai for “Rules”. I’m not sure how this is calculated into the removal reasons, so I’m just going to ignore it and assume everything is spamurai.

First up, there seems to be some sort of a spamurai subsystem called echelon. It seems to remove certain keywords, such as the EqG elsagate spam seen below, and lewd (OF? Snapchat?) stuff like puppyvids.69.

mypham71375 in Pony_irl

Equestria Girls Princess Animation Series - Twilight Sparkle Cutie Mark ...

Removed: spamurai (echelon: Equestria Girls Princess Animation Series) • 1 Comments • youtube.com • 2 yrs

Then, there’s some targeted removals, such as this one that targets clothing spam.

Removed: spamurai (approval required on hyperlink comment from high spam score account (suspected shirt affiliate spam)) • Adventurous-ties • 1 points • 5 months

Dog-Women consulting company will open the Ukrainian pharmaceutical market for you!

Hello this is the Ukrainian pharmaceutical market!

This market has been opened to you by the Dog-Women consulting company!

So what would u like to order?

Perspectrogen 2mg

star ratings

Limited time deal

-50%349KARMA

699 KARMA

About this item

With this item you can see the Perspective scores of absolutely everything!

Buy New

349KARMA

In stock

Buy now

No refunds after purchase.

Thank you for your purchase!

No refunds.

Legal note: This is a joke, no pharmaceuticals can actually be ordered from this blog post about Reddit spam filters.

And some more general rules-based filters, such as this one for account age.

Removed: spamurai (comment from account under 30 minutes matching spam conditions) • NewUser67 • 1 points • 23 min

fuck you fuck you fuck you

But alright, let’s try to figure out what’s going on with the infodump removals like the one I put in the banner art of this post.

AnywhereAlone6851 in Pony_irl

18 Random Facts That Will Blo

Removed: spamurai (*Removing potential spam content from unproved user*: link `t3_phc4xx` (0.12571795 perspective spam) by u/AnywhereAlone6851 (2.948587962963 days old, spammy: 4.5, hosted: false, 28 karma, 5 reports, org: `Skyinfo Online`, email: gmail.com) in r/Pony_irl (guest) posting pinterest.com from `oauth.reddit.com` via `nil` from UA: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36, RHS: oc:ac:kT:lw:bV:aX:af:a6:l5:y3:aT:m9:pt:f3:hZ:az:aR:aQ, LANG: en-US,en,q=0.9, TLS: SwxwvfHLtTxt/9qbo1dvBLEMSIQ=tT1LosI8/xDmUS7LMVuhb/olIJQ= - referrer: https://www.reddit.com/, thumbnail: `https://b.thumbs.redditmedia.com/K_Q91r66a3AEopEbzGkjkxHOpisoQbxa3hIoHxDerjc.jpg` - ```18 Random Facts That Will Blo ``` https://www.reddit.com/r/Pony_irl/comments/phc4xx/18_random_facts_that_will_blo/ ) • 0 Comments • pinterest.com • 1 month

That is a lot of information in there! Let’s break it down bit by bit:

link t3_phc4xx: this is the “fullname” ID of the post, it’s what shows up in urls except it is prefixed: t1 is comment, t2 is user, t3 is post, t4 is private message, and t5 is subreddit.

0.12571795 perspective spam: this is almost certainly using the Perspective API. Perspective is a free³ Google⁴ service that uses machine learning to “reduce toxicity online”.

I’m sure of this because perspective is a pretty unique word, the Perspective docs display sample results with similar score numbers (e.g. 0.24173126 and 0.4445836), and Perspective’s case studies page contains this quote from the CTO of Reddit:

“As Reddit scales, the integrity of our platform and ensuring healthy discourse among users and communities remains a priority. Perspective has been a valuable tool as we continue to strengthen the safety measures and tooling that we have in place.”

—Chris Slowe, Chief Technology Officer at Reddit

It seems like Reddit is using Perspective’s “experimental” SPAM attribute here though, which is intended to detect spam instead of toxicity. The data for this is trained on a SINGLE DATASET of the comments and moderation in the New York Times, which I find pretty interesting.

Unfortunately, since February 2026, we can no longer create a new Perspective API project on Google Cloud, so it is not possible to try it out anymore.

Well, that is unless we can find some leaked API keys :3. Which I may or may not have teehee..

$ curl 'https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=AIzac29ycnkgdGhpcyBpcyBjZW5zb3JlZCBsb2w' \
    --request POST \
    --header "Content-Type: application/json" \
    --data '{
              "comment": {
                "text":"18 Random Facts That Will Blo "
              },
              "requested_attributes": {
                "SPAM": {"score_type": "PROBABILITY"}
              }
            }'
{
  "attributeScores": {
    "SPAM": {
      "spanScores": [
        {
          "begin": 0,
          "end": 30,
          "score": {
            "value": 0.12571794,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.12571794,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}

..and thus, we can be 100% sure that this is the API Reddit used, because we get back the same⁵ 0.12571795 as we saw in spamurai earlier.

This is interesting because it means that this entire time it was possible for a bad actor to bypass one of the primary spamurai criterias by just changing their message until it’s non-spammy for Perspective’s free API.

It’s not even hard to do so, as SPAM score is extremely sensitive to changes of just a few characters:

$ query='Puppygirl Consulting is the best way to grow your revenue'
$ ./perspective.sh "$query"
0.8638981: Puppygirl Consulting is the best way to grow your revenue
$ for letters in {a..z}{a..z}
    do ./perspective.sh "$query $letters" | grep "0.0"
  done
0.010811162: Puppygirl Consulting is the best way to grow your revenue qp

You can see how going through all 2-letter combinations got us from a 86% spam score down to 1%, which is significantly less than pretty much any normal text.

It also ignores numbers and case for some reason:

$ ./perspective.sh 'Hi there, please call my work phone at 567890'
0.81438655: Hi there, please call my work phone at 567890
$ ./perspective.sh 'hi THEre, pleaSE Call my woRk phonE aT 022102'
0.81438655: hi THEre, pleaSE Call my woRk phonE aT 022102

As well as alternate alphabets:

$ ./perspective.sh 'привет'
0.35077864: привет
$ ./perspective.sh 'наххуи'
0.35077864: наххуи

Which means you can sometimes lower your spam score just by using cyrillic characters:

$ query='Buy my product'
$ ./perspective.sh "$query"
0.6473346: Buy my product
$ ./perspective.sh "$(echo -n $query | sed s/p/р/)"
0.4452748: Buy my рroduct

Anyways, moving on…

by u/AnywhereAlone6851: username, self-explanatory

2.948587962963 days old: account age as days, which is a pretty good indicator of spam accounts and ban evaders. But it does give us one interesting detail - I believe the account age is represented in seconds, because all the examples I have come out to a round number when multiplied by 86400 (amount of seconds per day).

spammy: 4.5: not sure what this is, could be the Minsky thing from earlier? Or the spammit score from earlier? Or a combination of multiple spamurai rules?

hosted: false: not sure, maybe to detect known hosting provider ip ranges?

28 karma: self-explanatory, karma is often used as a measure of an account’s presence

5 reports: total number of reports an account and its posts have received

org: Skyinfo Online: the ISP of user. This can tell you where the user is from and whether they’re using a VPN. In this case we can see that the spam is coming from Bangladesh, because that’s where SkyInfo Online operates from. Their website is incredible.

email: gmail.com: e-mail domain of the user

in r/Pony_irl (guest): the subreddit that the post is in. I believe the (guest) means that the user is not a subscriber of the subreddit. I assume that it would say something like “member”, “approved”, or “moderator” in other cases, but I don’t actually have any examples of that.

posting pinterest.com: the domain that’s being linked to

from oauth.reddit.com via nil: the user was authenticated with Reddit’s oauth flow, which is the default, and I belive the nil (Lua’s null) is where the name of a custom client would go, e.g. Relay in my case. I don’t have any examples of this being anything other than nil, so this is just speculation.

from UA: Mozilla/5.0 (Win…: this is the user agent string of the browser that’s being used. It tells us that the person is posting from the Chrome browser on Windows 8.1.

RHS: oc:ac:kT:lw:bV:aX…: this seems to be some sort of a fingerprinting hash Reddit uses. I believe this is Reddit’s own engineering and not an existing open-source solution. This hash is the exact same between this Chrome 93 example, and the Edge 95 example from the beginning. This leads me to conclude that the hash fingerprints browsers (Edge and Chrome are both Chromium) and is meant to detect scripts pretending to be a browser.

LANG: en-US,en,q=0.9: the value of the accept-language header, it tells websites what languages you’d like to see websites in. This can be used to detect potential VPN usage, e.g. if someone has Latvian as their language but is joining from a New York IP.

TLS: SwxwvfHLtTxt/9qbo…: this is TLS fingerprinting similar to JA3. It seems to be Reddit’s own engineering though, not an existing implementation.

referrer: https://www.reddit.com/: this is the page the user got onto Reddit from. Sometimes when opening Reddit links directly from other sites, your votes are not counted to discourage brigading, and this is what the referer is used for. In the case of spamurai it might be useful if the referer is something like buy-reddit-comments.info (or more realistically, a platform such as Fiverr).

thumbnail: https://…: the auto-generated thumbnail

- ```18 Random Facts That Will Blo ```: the markdown body of the post/comment

https://www.reddit.com/r/Pony_irl…: link to the post/comment

So that’s the full spamurai infodump with no clear reason for removal. There are also examples of spamurai clearly using the same data but with specific rules, such as the use of the spammy score here:

Removed: spamurai (URL-only comment from account with high spammy score) • GoodBoyBacon • 0 points • 24 min

https://www.reddit.com/r/ReallyBadGuys/comments/qw3rt1/if_ur_a_bad_guy_post_here_please/

Or the use of the perspective score here:

Removed: spamurai (REPORT: High spam perspective score on comment with hyperlink reported for spam. Removed but can be re-approved by mod.) • BrattyErmine12 • 0 points • 11 months

Coins are a virtual good you can use to award exemplary posts or comments. Support Reddit and encourage your favorite contributors to keep making Reddit better.

GET COINS

Here’s what you can buy with coins

Spend your coins on these Awards reserved exclusively for the finest Reddit contributors. Awarding a post or comment highlights it for all to see, and some Awards also grant the honoree special bonuses.

📷

Silver Award

Shows a Silver Award on the post or comment and ... that’s it. You’ll need 100 Coins.

The perspective spam score for the above post is either 0.9761621 or 0.9782609⁶. Also what’s interesting is that the specific rule there got triggered by someone reporting it for spam - thus we learn that sometimes user reports have an effect even without moderator intervention.

It’s also interesting how some of the removals adjust based on mod actions:

muyuwobsoq9q in Pony_irl

영어 배우기! 알파벳송 인기- تعليم الاطفال مع - العاب أطفالأغاني الحضانة وأغنية الأط...

Removed: spamurai (High karma-to-spam ratio on link content from 6+ spammy score account; mod approval of this content will reduce future removals) • 1 Comments • youtube.com • 2 yrs

misc

There’s also a bunch of removals that don’t really neatly fit into any of the above categories.

For example, Pinterest redirect links get removed:

Removed: pinterest redirect • 22_ghost_22 • 1 points • 4 months

https://pin.it/Sc4mUr1

As do mega.nz links:

Removed: streamer spam • EPIC_Gamer67 • 1 points • 11 months

https://mega.nz/folder/Ep1cV1d30s

The decryption key is

SW52YWxpZCBiYXNlNjQgc3RyaW5n

In the case of the comment above, it was actually a legitimate link to some archived YouTube videos, so it was falsefully removed.

Another banned kind of link is a freely available subdomain:

cpsryan in UnusAnnusArchival

All of the Sauce for Unus Annus Archives

Removed: freely available subdomains • (Unus Annus Archiving) • 0 Comments • self.UnusAnnusArchival • 11 months

In the above case the post didn’t contain those kinds of links per se, but it did contain a magnet link that reddit found and linkified the 2ftracker.opentrackr.org⁷ inside of. I’m not sure why opentrackr gets matched under “freely available subdomains” though.

But speaking of trackers, certain strings are straight up regex banned:

e4e5x0q8e1p in MyLittleOutOfContext

레인보우 대시 프레젠츠 26화 토렌.트 150927 26화 torrent HD 고화질 FULL 레인보우 대시가 선사하는 26회 토렌.트 150927 26화 다시보기

Removed: Matched forbidden regex u'torenteu' • 2 Comments • self.MyLittleOutOfContext • 10 yrs

Now, this one is super interesting to me because nowhere in the post does the string torenteu appear, yet we still somehow match our regex?

The reason this happens is because Reddit uses the unidecode library⁸ to convert post titles into ascii:

$ python2
Python 2.7.18 (default, Dec  9 2024, 19:35:20)
[GCC 9.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import unidecode
>>> unidecode.unidecode(u"레인보우 대시 프레젠츠 26화 토렌.트 150927 26화 torrent HD 고화질")
'reinbou daesi peurejenceu 26hwa toren.teu 150927 26hwa torrent HD gohwajil'
>>>

It then processes the string a bit more and arrives at "reinbou_daesi_peurejenceu_26hwa_torenteu_150927", which does match the u’torenteu’ regex.

I was curious as to whether this filter still exists, so I made some test posts on a subreddit I moderate using an alt account:

popstonia

overview comments submitted

sorted by:

there doesn't seem to be anything here

Incorrect combination :3c

There is no reason clicking here should do anything, and yet...

It’s hard to say? It seems like the string “torenteu” by itself does not get removed, so I assume the other removals are based on various other kinds of spam heuristics?

Something I did find interesting is that UA-12345678- got removed, but UA-49307539- did not! It’s interesting because there used to be a filter for that specific phrase too:

Removed: Failed inspection: Phrase(s) [u'UA-49307539-'] • c4c3u5o8c7n • 0 points • 11 months

다시보기 강아지들 토렌.트 torrent 토렌 DVD 1080p 720p HD Full HD DVD 1080p MKV

강아지들 토렌.트 file

1080p MKV 다시보기 강아지들 토렌.트 토렌.트 토렌 Torrent Comprehensive 720p HD

Coverage aggregated from sources all 토렌.트 파일 (Torrent) :

파일 받기 : 다시보기 강아지들 토렌.트 Torrent

Though, this case is a little more curious than just that. Once again, the removal phrase does not appear in the comment, but this time not even after running the text transformations!

The trick here is that the comment contains a link that goes through several redirects and then ends up on some Korean forum. And looking at the source code of said forum:

<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.google-anal [break] ytics.com/analy [break] tics.js','ga');

  ga('create', 'UA-49307539-2', 'auto');
  ga('send', 'pageview');

</script>

Aha! So the “inspection” means that Reddit literally opens the URL, follows redirects, and looks for the pattern on the page. In this case, the pattern matched is a Google Analytics ID, so that if the same spam ring was to change their IP and domain, the spam filter would still catch them.

I wanted to try this on my own account, so I put the string <pre>UA-49307539-2</pre> on a website and posted a link to it on Reddit.

This account has been banned

My test account (5 years old!) got banned immediately, and all of its post history got wiped too. RIP /u/popstonia.

For this reason, I changed the real number in this blogpost to UA-49307539-, which is in reality a random number - I would rather not put a piece of text out there that can kill people’s accounts through just posting it.

I tried to recreate the ban with a friend’s account which had a little more history on it, and that one ended up being fine. So I’m guessing this string only killed my test account because it was already a certain level of suspicious for the anti-spam filters.

I don’t actually know for sure whether the filter is still active, or whether my account getting nuked was a coincidence, but I’m choosing not to publicize the specific string used just to be safe.

Alright so this next one has a pretty interesting removal message:

Redstoner7 in UnusAnnusArchival

Massive Torrent of Unus Annus content

Removed: spam https://www.reddit.com/message/messages/8edha7 • (Unus Annus Archiving) • 0 Comments • self.UnusAnnusArchival • 11 months (edited 11 months)

The post here isn’t all that interesting, but what is is that it got immediately removed. I have no idea why the removal message would link to a specific Reddit PM? Is this the message that was sent to the user? Was it sent to admins? Was it sent to modmail? Is it a DMCA message?

I wanted to get to the bottom of this so I did some OSINTing and tracked down the person who made that post, Aria. I asked her to check the message link, but it turns out that it was not sent to her account. So this got me even more curious.

Now, something we do have is the id of the message - 8edha7. Just like the other ids on Reddit, it is sequential. This meant I could figure out when this linked message was sent based on the messages in my own message history. And this message appears to land in the latter half of May 2017!

I still don’t know what this message is and why it is linked in the removal reason, but it is a rather old message from before the Reddit account or the subreddit were even created.

neynime in MyLittleOutOfContext

Sex video chat with Russian girls. Free registration. DtAyqbIm

Removed: Janitor russian girls chat: Submitted by banned user neynime • 0 Comments • example.com • 5 yrs

I’m not sure what’s up with this one. Like obviously it’s just sex spam or whatever, but what’s up with that removal message? Is it talking about Janitor russian girls who chat, or is there a reddit janitor who did the removal? Why is it submitted by banned user? Is this like a banall from before that was a thing? So many questions, and unlike the previous post I can’t even reach out to anyone to ask about it.

Okay, but there’s one more removal I found pretty interesting, and it’s this one here:

Removed: some pages have personal info - 11/15/12 mg • gnbman • 1 points • 8 years

https://encyclopediadramatica.se/thumb/8/8a/Woll_Smoth_original.jpg/180px-Woll_Smoth_original.jpg

For those out of the loop, Encyclopedia Dramatica is a parody wiki site centered around internet culture and making fun of people. It’s pretty much like if 4chan was in charge of Wikipedia. A lot of the pages are pretty mean to their subjects, sometimes - as you might deduce from the removal message above - to the point of digging up and documenting their personal history.

Thus, it seems like mg decided to ban the entire domain and auto-remove any links to it. I believe this is noteworthy as it is the only removal here that is not just spam, but instead a legitimate website that Reddit did not like the content of.

Reddit engineering

So that was all I was able to deduce from what I saw myself, but as it turns out, Reddit has been writing about their anti-spam systems too!

There’s a post from 2023 on /r/RedditEng titled Protecting Reddit Users in Real Time at Scale that talks about internal systems called Rule-Executor-V1 (REV1), REV2, and Snooron.

The timeline is a bit messy, but how I understand it is that REV1 was created in 2016, then Snooron was developed in 2021 to modernize REV1, and two years later everything was migrated to REV2? I wonder if that migration is what led to me seeing the admin removal messages back in 2021.

Both REV1 and REV2 run off of Lua rules such as this:

if body_match("some bad text") then
  action(user)
end

This leads me to believe that REV1 is what we know as spamurai. The timeline seems to match, and we’ve seen samurai emit strings such as “nil” that you’d expect from Lua.

There have been fairly recent user reports of posts getting removed by the users /u/Safety_Spamurai and /u/bot-bouncer, so the spamurai name is still at least somewhat in use, even for REV2 or snooron.

But we also saw removals such as u’torenteu’ and u’UA-49307539-’, which are clearly Python2.7 unicode strings. The former was way before 2016, so that makes sense, but what about the latter removal that we only saw in like 2020?

Well, REV1 also ran on Python2.7, so I think there are two possible conclusions: either the REV1 code calls out the URL inspection code written in Python2.7, or the inspection code is entirely separate from REV1/spamurai. I suspect the latter, because all of the spamurai removal messages seem to be prefixed with “spamurai”.

I also learned that, according to this talk, snooron runs on Flink Stateful Functions, classifies posted images, runs OCR on said images, and uses Python3 for its workers.

I also found this Australian eSafety PDF which lists Reddit as using, as of 2024, the Hive AI for OCR and image/video classification, but also the Google Vision OCR API.

They explain that Hive’s OCR supports 12 languages, and thus they also need Google’s OCR to support a lot more of them. They also mentioned that they’re working on an internal tool that would support 80 languages.

Though, the text classification itself is done in-house using snooron. Snooron also has internal image hash-matching functionality. I don’t know whether this is just using existing anti-abuse/anti-terrorism hash databases, or if it’s also Reddit’s own hashes for common spam and such.

Going back in time, I also found this ticket from 2009 where spez[A] confirms that a user called crm114 is a spam filter that can be trained by moderators. CRM114 is an old open-source spam classification software that, among other things, lets you “train” it with data to make its detection more reliable.

This is also why the admintools.spam method in Reddit’s source code has a train_spam keyword - it decides whether the anti-spam filters should be trained off of the performed moderator action. So, approve good posts in your sub if you want less false-positives?

Why now?

So why release all this information now and not 5 years ago? I believe the information in this post, if released back in 2021, would’ve been catastrophic for Reddit’s spam issues. I don’t care too much about large companies, but covering internet forums in spam is not something I strive to do. In 2026 however, I believe this information is no longer dangerous to publicly share.

First of all, the Perspective API is shutting down by the end of this year. I doubt Reddit is still using this API, and even if they are, they’ll have to migrate off of it soon anyways. Secondly, there’s that elephant in the room. LLMs have changed the game and revolutionized… the spam industry. And thus, I think it’s safe to assume that Reddit has had to overhaul a lot in their anti-spam systems to make it work in the year of 2026.

afterword

hiii! probably not the blogpost you were expecting, but hopefully a fun one nontheless! ^^

as usual, i did the whole “handwritten html/css, no images, no external resources, no javascript” thing for this post too (46kB gzipped btw!), but while recreating the old reddit ui i was pleasantly surprised by just how nice its code is! it feels like it was written by someone who actually loves html and css and wants to give me a warm hug. i was amused by the css actually using the orangered color by name, a rare sight these days!

anyways, some other updates - many of you are probably awaiting my x86css blog post, and it is hopefully coming out at some point, but in the meanwhile i gave a talk about the project at css day (which was a really fun event!!). unfortunately, the recordings for the talk will initially be behind a paywall, but they should become public eventually. i’m also trying to get the same talk accepted at 40c3, in which case the recordings will be available immediately. besides that, i’ll likely be doing a few other talks this year too - check my slides page for up to date info.

other than that i’m really hoping to host x3ctf again this year. we’re still not sure when it is happening, but i think we are all aiming to make it happen before the end of the year.

thank you so much for reading <3

If you’d like to reach out, feel free to message me on my socials or at lyra.horse [at] gmail.com.

Discuss this post on: twitter, mastodon, lobsters

In this post I will default to visiting Old Reddit pages with a logged-in account. Not all information may be visible on New Reddit or as a logged-out user. ↩︎
Relay for reddit is closed-source, but unobfuscated. The code snippet shown is a decompilation. ↩︎
It seems free to me. The FAQ states that “Perspective API is free and entirely self-service”. I did not look through their Privacy Policy, so I don’t know whether they sell the data, but the API does have a doNotStore parameter: “Do not store the comment or context sent in this request. By default, the service may store comments/context for debugging purposes.”. ↩︎
Perspective API is made by Jigsaw, an incubator within Google, and Google’s Counter Abuse Technology team. ↩︎
We got back 0.12571794, which is technically off by 0.00000001 from 0.12571795, but I believe this is just a rounding error. ↩︎
Because I’m not sure how Reddit processes their markdown before shoving it into the Perspective API, so I tried two variants. ↩︎
The domain, of course, is actually tracker.opentrackr.org, but it appears in the magnet link as udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce, so the 2f from %2f gets prepended to the domain. ↩︎
Unidecode is a library that attempts to represent unicode strings as ASCII text, which I believe is called romanization for languages such as Korean. For the demo in the blogpost I had to get the Version 1.2.0 and manually install it as it is the last version to support Python 2.7. ↩︎