QUICK NEWS

{NEW} - A new css video is up.

{OLD} - New video courtesy of Skhilled, Thanks for posting it up.

Video of the moment:


Internal Links

SMF Sites

Quick Info

WALA Preview - Web Access Log Analyzer

Started by shawnb61, Sep 03, 2025, 08:42 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

shawnb61

Not approved yet, but I figured I'd share with the gang over here.

New mod I've been working on:
https://github.com/sbulen/SMF-Web-Access-Log-Analyzer

Release 1.0.0 download:
https://github.com/sbulen/SMF-Web-Access-Log-Analyzer/releases/download/v1.0.0/WALA-100.zip

I've been helping folks facing bot attacks over at SMF, and have come to realize most folks don't have a way to analyze their web access logs.  E.g., Are these my users???  Which ASNs in Brazil are crawling me?  Do I have users there?

So I wrote a little mod that uses the freebie DBIP database (https://db-ip.com/db/lite.php) to assign country & ASN to both the smf_members table and a web access log.  It has a bunch of canned reports against that.

Toughest challenge was that these are all fairly big files.  So...  I break them all up into small chunks & use the fetch API to get them up there.  Also, a web access log can easily have a few hundred thousand rows of data...  Assigning attributes to all of those rows can take a HUGE amount of time.  I figured out a way to work in small IP ranges, and build lookups in memory instead of using DB joins.   So for the most part, the uploads & attribute assignments just take 2-5 minutes each.  (Via DB lookups they were more like 15-40 minutes each....)

Apache combined log format is required...

Dave


Skhilled

Wow! Great! I know it was a challenge but I sure hope that you get it approved.  ;D

Now, I'll have to make sure my logs are that format when I get a chance.

shawnb61

An Apache web access log entry is usually a space-delimited row of text.  Text enclosures are usually double quotes ", with an alternate enclosure of square brackets [] that seem to be used for datetimes.  Blank fields are represented as a hyphen -.  An example:
52.173.235.85 - - [09/Sep/2025:00:00:47 -0400] "GET /smf/index.php?topic=30136.0 HTTP/2.0" 403 285 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot" 0 0 "on:TLSv1.3:TLS_AES_256_GCM_SHA384" 332 18454 194.255.155.15 wwwvoogerfoogercom - - 52.173.235.85
Breaking down this entry:

Field 1, IP: 52.173.235.85
Field 2, Client, usually a blank: -
Field 3, Requester, usually a blank: -
Field 4, Datetime: [09/Sep/2025:00:00:47 -0400]
Field 5, Request: "GET /smf/index.php?topic=30136.0 HTTP/2.0"
Field 6, HTTP status: 403
Field 7, Size of response: 285
Field 8, Referrer: -
Field 9, Useragent: "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"
Field 10+, Vary per host, in this example, it appears to be connection cypher info: 0 0 "on:TLSv1.3:TLS_AES_256_GCM_SHA384" 332 18454 194.255.155.15 wwwvoogerfoogercom - - 52.173.235.85

WALA only uses the first nine columns, the rest are not loaded.


Skhilled