Michael Wilde replied to Nikita's discussion Count failures and success via transaction
Nikita posted a discussionHey, wanted to run a scenario by you as head Splunk Cloud Ninja.
We're big Splunk users in our IT department, use it for all our Web assets. We recently started a team putting together some stuff in the cloud. I have an application that has some parts on Amazon EC2 (both Linux and Windows) and some parts on Microsoft Azure. Both clouds dynamically scale, and we may eventually make globally distributed instances (AMER/EMEA/APAC) for fast app response to our international branches.
So now we're looking to set Splunk up on this, and I hoped to get your thoughts on what we thought this architecture might look like. Network streaming in the cloud generally seems to be suboptimal, very lossy and the dynamic IP thing makes it a pain, so the architecture we were considering was light forwarders on each cloud server (pulling app logs, system logs, Web logs), which would forward to a forwarder on a colocated management server (we have a management tier in each "cloudlet") and then that would perhaps weed stuff out to minimize cost across the wire, and forward back to a index/search server either on premise or maybe in EC2, depending on how slow our IT department is.
In other words, in e.g. Amazon AMER,
in Azure AMER,
on prem or somewhere
Would this work, and does it make sense in a cloud use case? Can you forward to a forwarder? We'd like to do a little aggregation on each "cloudlet" just so the server doesn't have to be able to accept incoming connections from a zillion dynamic IP addresses. Would this be reasonably reliable? We don't currently see the need to do duplicate sending to multiple index servers or whatnot, at least not unless this scheme ends up losing data a lot.
Advanced topic - can a forwarder tier be scaled safely? So if we scaled the management tier in each cloudlet that's running the forwarder, would the individual nodes be fine with just forwarding through a random one of them?
Thanks!
Permalink Reply by Michael Wilde on March 7, 2011 at 9:46pm You're pretty much right on.Be careful of the "fowarding through one node" idea--you can end up with a single point of failure. Instead of having a bunch of "intermediate forwarders" -- which you can do -- consider adding a few more indexers. You will spread the load across multiple indexers, and when search-time comes around, you'll benefit from all those extra cores--and search should be faster.
Lets continue the discussion!
Permalink Reply by Ernest Mueller on March 8, 2011 at 7:38am Thanks for the response!
The only downside to that is that then all those indexers have to be exposed publicly to do the federated search - well, and that they all require an additional license, right? One of the bugbears of doing cloud is that we bring these environments up and down frequently and scale each tier frequently and anywhere a license is required pretty much brings that to a screeching halt. Also if we bring an environment down all its logs would be lost - maybe that's OK, and we could take steps if we wanted to keep them - but then there's also the problem of having the central splunk server know what dynamic indexers are in place out in the cloud. Is there a clever solution for that?
To be more clear about the kind of architecture, so we have one application running in e.g. Amazon in AMER. But there are multiple instances of it (dev, test, production). We often 'tear down' a dev environment, for example, and build up a new one via model driven automation. Then we might have a separate app, or even part of that same app (sadly, but due to issues above my pay grade) in Azure AMER as well. (I suspect trying to run an actual indexer on Azure would be a losing battle because there is a very low limit on the number of open ports ("endpoints") you can have on an Azure instance).
And then of course we hope to scale up eventually, having a mirrored installation in EMEA/APAC/etc.We do have a "core" set of servers on each provider/geolocation that don't go up and down all the time where something like an indexer might live, we use those for replicated LDAP/DBs and other such semi-centralized assets needing greater stability.
The central server knowing about multiple indexers - we have an automated provisioning setup that can probably tell splunk that depending on how easy it is (ideally API, failing that automated config file parsing) to tell Splunk they are there or gone.
On forwarding through one node, if we scaled up our management tier where the intermediate forwarders live, is it "safe" to just load balance through a farm of them? (We may need to scale up our management tier anyway for the other stuff - registry, monitoring, etc - that lives on it).
Permalink Reply by Michael Wilde on March 8, 2011 at 12:40pm Lets assume you're a licensed customer in this case, in 4.2 while an indexer/search head does need a license, that can all be managed centrally now. When you're doing distributed search, only a single port would need to be open from server to server (8089). When in your browser, you'd connect to one splunk server, that local splunkd will dispatch jobs to other splunk servers responsible for indexing. Should work just fine.
Dunno if you know about Deployment Server -- it comes with the licensed version of Splunk and can be used to control configuration of remote splunk instances, whatever their role.
Yes, you could load balance through a farm of forwarders (splunk has AutoLB, so you could set it to choose a different forwarder every N seconds if you like)
© 2012 Created by Michael Wilde.
