Large arrays for matching

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Large arrays for matching

rsyslog-users mailing list
Hey folks,

What is the most efficient way of comparing a parsed field ($!foo) with a
very large list of possible matches.  Are lookup tables faster than simply
doing something like this:

if $!foo == ["my", "big", "long", "...", "list"] then
/to/the/moon/alice.log;RAWLOG

Cheers,

JB
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

David Lang
Yes, lookup tables are FAR faster.

> What is the most efficient way of comparing a parsed field ($!foo) with a
> very large list of possible matches.  Are lookup tables faster than simply
> doing something like this:
>
> if $!foo == ["my", "big", "long", "...", "list"] then
> /to/the/moon/alice.log;RAWLOG
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

Rainer Gerhards
I wouldn't say so. The array if does a binary search...
Rainer

Sent from phone, thus brief.

Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:

> Yes, lookup tables are FAR faster.
>
> What is the most efficient way of comparing a parsed field ($!foo) with a
>> very large list of possible matches.  Are lookup tables faster than simply
>> doing something like this:
>>
>> if $!foo == ["my", "big", "long", "...", "list"] then
>> /to/the/moon/alice.log;RAWLOG
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

rsyslog-users mailing list
if the lookup tables are O(log n), what are the arrays, from a cost
perspective?

Cheers,

JB

On Thu, Jul 6, 2017 at 5:24 PM, Rainer Gerhards <[hidden email]>
wrote:

> I wouldn't say so. The array if does a binary search...
> Rainer
>
> Sent from phone, thus brief.
>
> Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:
>
> > Yes, lookup tables are FAR faster.
> >
> > What is the most efficient way of comparing a parsed field ($!foo) with a
> >> very large list of possible matches.  Are lookup tables faster than
> simply
> >> doing something like this:
> >>
> >> if $!foo == ["my", "big", "long", "...", "list"] then
> >> /to/the/moon/alice.log;RAWLOG
> >>
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T LIKE THAT.
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

Rainer Gerhards
O(log n) - binary search

Rainer

2017-07-07 16:58 GMT+02:00 Joe Blow via rsyslog <[hidden email]>:

> if the lookup tables are O(log n), what are the arrays, from a cost
> perspective?
>
> Cheers,
>
> JB
>
> On Thu, Jul 6, 2017 at 5:24 PM, Rainer Gerhards <[hidden email]>
> wrote:
>
>> I wouldn't say so. The array if does a binary search...
>> Rainer
>>
>> Sent from phone, thus brief.
>>
>> Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:
>>
>> > Yes, lookup tables are FAR faster.
>> >
>> > What is the most efficient way of comparing a parsed field ($!foo) with a
>> >> very large list of possible matches.  Are lookup tables faster than
>> simply
>> >> doing something like this:
>> >>
>> >> if $!foo == ["my", "big", "long", "...", "list"] then
>> >> /to/the/moon/alice.log;RAWLOG
>> >>
>> > _______________________________________________
>> > rsyslog mailing list
>> > http://lists.adiscon.net/mailman/listinfo/rsyslog
>> > http://www.rsyslog.com/professional-services/
>> > What's up with rsyslog? Follow https://twitter.com/rgerhards
>> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> > DON'T LIKE THAT.
>> >
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

David Lang
In reply to this post by rsyslog-users mailing list
binary searches are O(log n), so very similar.

lookup tables have the ability to be loaded from an external file and updated
while running rather than having to change the config and restart.

lookup tables also let you return multiple values, so you do one lookup, and
then get a variable back that you can test to do different things rather than
having to do the comparison multiple times.

If you are looking up IP addresses, look at the sparse tables, they let you have
a table with one IP per block (the first IP in the block) and match everything
in that block rather than having to list every IP address separately.

David Lang

On Fri, 7 Jul 2017, Joe Blow via rsyslog wrote:

> if the lookup tables are O(log n), what are the arrays, from a cost
> perspective?
>
> Cheers,
>
> JB
>
> On Thu, Jul 6, 2017 at 5:24 PM, Rainer Gerhards <[hidden email]>
> wrote:
>
>> I wouldn't say so. The array if does a binary search...
>> Rainer
>>
>> Sent from phone, thus brief.
>>
>> Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:
>>
>>> Yes, lookup tables are FAR faster.
>>>
>>> What is the most efficient way of comparing a parsed field ($!foo) with a
>>>> very large list of possible matches.  Are lookup tables faster than
>> simply
>>>> doing something like this:
>>>>
>>>> if $!foo == ["my", "big", "long", "...", "list"] then
>>>> /to/the/moon/alice.log;RAWLOG
>>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

rsyslog-users mailing list
So if i had, hypothetically, 20,000 disparate IP addresses i wanted to do
inline match/nomatch, a plain array would be the same, performance wise, as
the lookup tables right (O(log n))?

Are there any funky tricks i could do with workers/threading on the main q
to increase throughput if i'm doing tons of inline comparisons where n is a
large number?  When i do these large comparisons, i see the main q backing
up.  Any idea how to spread the load out over more workers/cpus?  The box
has a ton of CPU and memory.

Cheers,

JB



On Fri, Jul 7, 2017 at 12:47 PM, David Lang <[hidden email]> wrote:

> binary searches are O(log n), so very similar.
>
> lookup tables have the ability to be loaded from an external file and
> updated while running rather than having to change the config and restart.
>
> lookup tables also let you return multiple values, so you do one lookup,
> and then get a variable back that you can test to do different things
> rather than having to do the comparison multiple times.
>
> If you are looking up IP addresses, look at the sparse tables, they let
> you have a table with one IP per block (the first IP in the block) and
> match everything in that block rather than having to list every IP address
> separately.
>
> David Lang
>
>
> On Fri, 7 Jul 2017, Joe Blow via rsyslog wrote:
>
> if the lookup tables are O(log n), what are the arrays, from a cost
>> perspective?
>>
>> Cheers,
>>
>> JB
>>
>> On Thu, Jul 6, 2017 at 5:24 PM, Rainer Gerhards <[hidden email]
>> >
>> wrote:
>>
>> I wouldn't say so. The array if does a binary search...
>>> Rainer
>>>
>>> Sent from phone, thus brief.
>>>
>>> Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:
>>>
>>> Yes, lookup tables are FAR faster.
>>>>
>>>> What is the most efficient way of comparing a parsed field ($!foo) with
>>>> a
>>>>
>>>>> very large list of possible matches.  Are lookup tables faster than
>>>>>
>>>> simply
>>>
>>>> doing something like this:
>>>>>
>>>>> if $!foo == ["my", "big", "long", "...", "list"] then
>>>>> /to/the/moon/alice.log;RAWLOG
>>>>>
>>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

David Lang
On Fri, 7 Jul 2017, Joe Blow wrote:

> So if i had, hypothetically, 20,000 disparate IP addresses i wanted to do
> inline match/nomatch, a plain array would be the same, performance wise, as
> the lookup tables right (O(log n))?

yes.

> Are there any funky tricks i could do with workers/threading on the main q
> to increase throughput if i'm doing tons of inline comparisons where n is a
> large number?  When i do these large comparisons, i see the main q backing
> up.  Any idea how to spread the load out over more workers/cpus?  The box
> has a ton of CPU and memory.

If this becomes the bottleneck (measure first, O(log n) scales _really_ well),
you can have multiple worker threads processing the messages.

If you are dealing with 20,000 ip addresses, first make sure you only have to do
the lookup once per log message (i.e. not having multiple if statements checking
if it's part of that subset)

Then look at your IP addresses, I'll bet that you have a bunch of cases where
you have a run of multiple IP addresses that return the same value. That will
let you use a sparse array so that each run becomes a single entry (or at worst,
two entries, one for the beginning, one just after the end), that will
significantly shrink your table size.

table lookup was designed to be able to do geo-ip table lookups, you are dealing
with much smaller lists.

David Lang

> Cheers,
>
> JB
>
>
>
> On Fri, Jul 7, 2017 at 12:47 PM, David Lang <[hidden email]> wrote:
>
>> binary searches are O(log n), so very similar.
>>
>> lookup tables have the ability to be loaded from an external file and
>> updated while running rather than having to change the config and restart.
>>
>> lookup tables also let you return multiple values, so you do one lookup,
>> and then get a variable back that you can test to do different things
>> rather than having to do the comparison multiple times.
>>
>> If you are looking up IP addresses, look at the sparse tables, they let
>> you have a table with one IP per block (the first IP in the block) and
>> match everything in that block rather than having to list every IP address
>> separately.
>>
>> David Lang
>>
>>
>> On Fri, 7 Jul 2017, Joe Blow via rsyslog wrote:
>>
>> if the lookup tables are O(log n), what are the arrays, from a cost
>>> perspective?
>>>
>>> Cheers,
>>>
>>> JB
>>>
>>> On Thu, Jul 6, 2017 at 5:24 PM, Rainer Gerhards <[hidden email]
>>>>
>>> wrote:
>>>
>>> I wouldn't say so. The array if does a binary search...
>>>> Rainer
>>>>
>>>> Sent from phone, thus brief.
>>>>
>>>> Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:
>>>>
>>>> Yes, lookup tables are FAR faster.
>>>>>
>>>>> What is the most efficient way of comparing a parsed field ($!foo) with
>>>>> a
>>>>>
>>>>>> very large list of possible matches.  Are lookup tables faster than
>>>>>>
>>>>> simply
>>>>
>>>>> doing something like this:
>>>>>>
>>>>>> if $!foo == ["my", "big", "long", "...", "list"] then
>>>>>> /to/the/moon/alice.log;RAWLOG
>>>>>>
>>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com/professional-services/
>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>> DON'T LIKE THAT.
>>>
>>>
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

rsyslog-users mailing list
Right now there are 4 spots per log line that i have to look in to check,
so I was doing:

if $!dhost1 == '''+biglist+''' or $!dhost2 == '''+biglist+''' or $!shost1
== '''+biglist+''' or $!shost2 == '''+biglist+''' then
/to/the/moon/alice.log

For workers, i was doing stuff like this:

queue.dequeuebatchsize="5000"
queue.workerthreads="32"
queue.workerthreadminimummessages="10000"

Any suggestions for the best way to spread the load out over workers?  I'm
doing like 80k EPS on the box.

Cheers,

JB


On Fri, Jul 7, 2017 at 1:25 PM, David Lang <[hidden email]> wrote:

> On Fri, 7 Jul 2017, Joe Blow wrote:
>
> So if i had, hypothetically, 20,000 disparate IP addresses i wanted to do
>> inline match/nomatch, a plain array would be the same, performance wise,
>> as
>> the lookup tables right (O(log n))?
>>
>
> yes.
>
> Are there any funky tricks i could do with workers/threading on the main q
>> to increase throughput if i'm doing tons of inline comparisons where n is
>> a
>> large number?  When i do these large comparisons, i see the main q backing
>> up.  Any idea how to spread the load out over more workers/cpus?  The box
>> has a ton of CPU and memory.
>>
>
> If this becomes the bottleneck (measure first, O(log n) scales _really_
> well), you can have multiple worker threads processing the messages.
>
> If you are dealing with 20,000 ip addresses, first make sure you only have
> to do the lookup once per log message (i.e. not having multiple if
> statements checking if it's part of that subset)
>
> Then look at your IP addresses, I'll bet that you have a bunch of cases
> where you have a run of multiple IP addresses that return the same value.
> That will let you use a sparse array so that each run becomes a single
> entry (or at worst, two entries, one for the beginning, one just after the
> end), that will significantly shrink your table size.
>
> table lookup was designed to be able to do geo-ip table lookups, you are
> dealing with much smaller lists.
>
> David Lang
>
>
> Cheers,
>>
>> JB
>>
>>
>>
>> On Fri, Jul 7, 2017 at 12:47 PM, David Lang <[hidden email]> wrote:
>>
>> binary searches are O(log n), so very similar.
>>>
>>> lookup tables have the ability to be loaded from an external file and
>>> updated while running rather than having to change the config and
>>> restart.
>>>
>>> lookup tables also let you return multiple values, so you do one lookup,
>>> and then get a variable back that you can test to do different things
>>> rather than having to do the comparison multiple times.
>>>
>>> If you are looking up IP addresses, look at the sparse tables, they let
>>> you have a table with one IP per block (the first IP in the block) and
>>> match everything in that block rather than having to list every IP
>>> address
>>> separately.
>>>
>>> David Lang
>>>
>>>
>>> On Fri, 7 Jul 2017, Joe Blow via rsyslog wrote:
>>>
>>> if the lookup tables are O(log n), what are the arrays, from a cost
>>>
>>>> perspective?
>>>>
>>>> Cheers,
>>>>
>>>> JB
>>>>
>>>> On Thu, Jul 6, 2017 at 5:24 PM, Rainer Gerhards <
>>>> [hidden email]
>>>>
>>>>>
>>>>> wrote:
>>>>
>>>> I wouldn't say so. The array if does a binary search...
>>>>
>>>>> Rainer
>>>>>
>>>>> Sent from phone, thus brief.
>>>>>
>>>>> Am 06.07.2017 23:01 schrieb "David Lang" <[hidden email]>:
>>>>>
>>>>> Yes, lookup tables are FAR faster.
>>>>>
>>>>>>
>>>>>> What is the most efficient way of comparing a parsed field ($!foo)
>>>>>> with
>>>>>> a
>>>>>>
>>>>>> very large list of possible matches.  Are lookup tables faster than
>>>>>>>
>>>>>>> simply
>>>>>>
>>>>>
>>>>> doing something like this:
>>>>>>
>>>>>>>
>>>>>>> if $!foo == ["my", "big", "long", "...", "list"] then
>>>>>>> /to/the/moon/alice.log;RAWLOG
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>> rsyslog mailing list
>>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>>> http://www.rsyslog.com/professional-services/
>>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>>> myriad
>>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>>> DON'T LIKE THAT.
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
>>>>> myriad
>>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>>> DON'T LIKE THAT.
>>>>>
>>>>> _______________________________________________
>>>>>
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>>
>>>>
>>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Large arrays for matching

David Lang
On Fri, 7 Jul 2017, Joe Blow wrote:

> Right now there are 4 spots per log line that i have to look in to check,
> so I was doing:
>
> if $!dhost1 == '''+biglist+''' or $!dhost2 == '''+biglist+''' or $!shost1
> == '''+biglist+''' or $!shost2 == '''+biglist+''' then
> /to/the/moon/alice.log

that's ugly (having to look up four different things)

> For workers, i was doing stuff like this:
>
> queue.dequeuebatchsize="5000"
> queue.workerthreads="32"

This is probably a significant problem, back it down and only increas it if you
find that a thread is maxing out the CPU. Having too many threads can cripple
you (the other big thing is inappropriate dynafilecachesize)

> queue.workerthreadminimummessages="10000"

> Any suggestions for the best way to spread the load out over workers?  I'm
> doing like 80k EPS on the box.

"premature optimization is the root of all evil", the first step is to measure
what's going on and look at the full config.

Any one message will be processed in a single thread, multiple threads work on
different messages in parallel.

do you have impstats data?

look at top and enable per-thread reporting ('H') make sure that every queue has
a name, and show us what top looks at while running under load (we just need to
see the rsyslog threads)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Loading...