QUICK NEWS

{NEW} - More to come for this tab...

{OLD} - A new css video is up.


Video of the moment:


Internal Links

SMF Sites

Quick Info

WALA Preview - Web Access Log Analyzer

Started by shawnb61, Sep 03, 2025, 08:42 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

shawnb61

Not approved yet, but I figured I'd share with the gang over here.

New mod I've been working on:
https://github.com/sbulen/SMF-Web-Access-Log-Analyzer

Release 1.0.0 download:
https://github.com/sbulen/SMF-Web-Access-Log-Analyzer/releases/download/v1.0.0/WALA-100.zip

I've been helping folks facing bot attacks over at SMF, and have come to realize most folks don't have a way to analyze their web access logs.  E.g., Are these my users???  Which ASNs in Brazil are crawling me?  Do I have users there?

So I wrote a little mod that uses the freebie DBIP database (https://db-ip.com/db/lite.php) to assign country & ASN to both the smf_members table and a web access log.  It has a bunch of canned reports against that.

Toughest challenge was that these are all fairly big files.  So...  I break them all up into small chunks & use the fetch API to get them up there.  Also, a web access log can easily have a few hundred thousand rows of data...  Assigning attributes to all of those rows can take a HUGE amount of time.  I figured out a way to work in small IP ranges, and build lookups in memory instead of using DB joins.   So for the most part, the uploads & attribute assignments just take 2-5 minutes each.  (Via DB lookups they were more like 15-40 minutes each....)

Apache combined log format is required...

Dave


Skhilled

Wow! Great! I know it was a challenge but I sure hope that you get it approved.  ;D

Now, I'll have to make sure my logs are that format when I get a chance.

shawnb61

An Apache web access log entry is usually a space-delimited row of text.  Text enclosures are usually double quotes ", with an alternate enclosure of square brackets [] that seem to be used for datetimes.  Blank fields are represented as a hyphen -.  An example:
52.173.235.85 - - [09/Sep/2025:00:00:47 -0400] "GET /smf/index.php?topic=30136.0 HTTP/2.0" 403 285 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot" 0 0 "on:TLSv1.3:TLS_AES_256_GCM_SHA384" 332 18454 194.255.155.15 wwwvoogerfoogercom - - 52.173.235.85
Breaking down this entry:

Field 1, IP: 52.173.235.85
Field 2, Client, usually a blank: -
Field 3, Requester, usually a blank: -
Field 4, Datetime: [09/Sep/2025:00:00:47 -0400]
Field 5, Request: "GET /smf/index.php?topic=30136.0 HTTP/2.0"
Field 6, HTTP status: 403
Field 7, Size of response: 285
Field 8, Referrer: -
Field 9, Useragent: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
Field 10+, Vary per host, in this example, it appears to be connection cypher info: 0 0 "on:TLSv1.3:TLS_AES_256_GCM_SHA384" 332 18454 194.255.155.15 wwwvoogerfoogercom - - 52.173.235.85

WALA only uses the first nine columns, the rest are not loaded.


Skhilled


shawnb61

New version uploaded - v1.0.6.  This version adds the ability to generate & download CIDR lists.

https://custom.simplemachines.org/index.php?mod=4442

You can enter lists of country codes &/or ASNs, and a text file will be downloaded that will include all CIDRs for the requested countries & ASNs.  You can enter a command to be added at the beginning of each line, e.g., 'Deny from', in case you wanted to add these lines to your .htaccess.

Use caution...  This can easily generate some massive text files...  If you encounter resource issues, you will need to break it up into smaller pieces.  (Though, it's kinda fun entering whole swaths of the planet & watching your ISP sweat...)

ipv4 & ipv6 supported.  MySQL & pg supported, as always, with all my mods.

I always felt this was a tool missing in our arsenal.  I hope folks find it helpful.


Skhilled

As soon as I get this stupid server sorted, I'll give it a try. :)