Videos

  • Add Videos
  • View All

Latest Activity

Profile Icon
Greg Vallenari is now a member of splunkninja Sunday
Profile Icon
Profile Icon
Michael Wilde commented on Michael Wilde's video
Sure...  When you do group mapping, map them to groups that don't have the domain admins in them.  I have a separate OU=Groups that has "Splunk Users, Splunk Admins, Splunk Power Users" as group names, and specific users…
Feb 8
Profile Icon
Mike Hartford commented on Michael Wilde's video
I want to give LDAP access to my splunk servcie but I don't want the LDAP users to have admin capabilitys in Splunk.  Can I keep the domain admins out of Splunk if I have LDAP authentication???
Feb 7
Profile Icon
Mike Hartford left a comment for Jonathan Hawes
Helow Jonathan,   Glad to have another Splunker.  I've been useing Splunk for 2 years and am hooked.  I leared how to spell splunk and | transaction too.  you'll learn that one soon.   Go over to Splunk…
Feb 7
Profile Icon
Mike Hartford commented on Mike Hartford's blog post 'tees for the holy day'
  Holy Batskins Ninja, zzzzzwap zgruppp kapow a hidden stash, how great is that!!!!   The team that found them must have special bat senses and highly tooned Splunking skills   I like to wear Extra Lovable…
Feb 7
Profile Icon
Learning, learning, learning . . . Our Splunk "expert" is gone, and the non-programmer gets to learn the task! How do you spell SPLUNK?
Status posted by Jonathan Hawes Feb 7
Profile Icon
Jonathan Hawes is now a member of splunkninja Feb 7
Hi
I have just started implementing splunk for some of our application logging and while most logs seem to be working well we have a small issue with some XML messages.

I say messages because the XML-RPC for a particular system is logged in individual files rather than a log and there are around 40-50k files produced each day.

to increase indexing and reduce disk space i THINK we need to alter the way splunk indexes the files. I assume we need to index the entire contents of the file by setting segmentation to ignore fields within the file. But i cannot find a good example anywhere on the web of a related config.

the files contain a text header, an XML-RPC request, another text line(the http response code) and the response XML-RPC. we are not interested in stats on the contents but rather just having the contents indexed so we can locate message times.

dont suppose someone could give me a pointer as to how i can configure this?

thanks in advance

Tags: rpc, segmentation, xml

Views: 30

Reply to This

Replies to This Discussion

Couple o' Questions for ya!

Is each file a message?
Is there a timestamp in the message?
Is the created date on the file the time the event occured?
What sourcetype is splunk assigning when it indexes the files?
ok so format is as follows: text on line 1, a cert id then xml followed by captured response xml and then duplicated again for the transmission, thats subject to change if unable to respond due to system being down etc.. (the line of hyphens are not in the files)

1.each file is a record of the transaction into the system, the response from the system upstream and the reply back out
2. the timestamp occurs within the xml messages
3.the files are created once the transactions complete, or fail.
4. sourcetype =xml-too_small

this is purely and example and does not contain any data from our systems
had to replace chevrons with hashes as the site doesn't like it
thanks for your response, your help is most appreciated.

--
message from client for system to forward to service:
Client-Cert: certid-xxxxxxxx
xml: #ez:BusinessTransaction xmlns:ez="http://www.example.net"##ez:BusinessTransactionHeader operatorId="opID1234567890" operatorTransactionId="1234" operatorIssuedDate="2009-11-02T00:26:03" /##ez:BusinessTransactionBody##ez:operationCheck serviceNumber="01234567890" type="max" /##/ez:BusinessTransactionBody##/ez:BusinessTransaction#

response from service to client: 200
#?xml version="1.0" encoding="UTF-8"?#
#ez:BusinessTransaction xmlns:ez="http://www.example.net"#
#ez:BusinessTransactionHeader transactionCompletedDate="2009-11-02T00:05:56" transactionReceivedDate="2009-11-02T00:05:51" transactionId="123456" serviceIssuedDate="2009-11-02T00:05:56" operatorTransactionId="1234" operatorId="Media-1" /#
#ez:BusinessTransactionBody#
#ez:operationCheck type="max" serviceNumber="123456789" /#
#ez:operationCheck#
#ez:Product#
#ez:type speedMax="1111" type="Option 27" /#
#/ez:Product#
#ez:DeliveryDetails serviceNumber="123456789" typeCode="6ugev4" /#
#ez:Exchange code="r000000000022" name="box2" /#
#/ez:operationCheckResult#
#ez:Messages#
#ez:Message code="123-12345"#service activated#/ez:Message#
#/ez:Messages#
#/ez:BusinessTransactionBody#
#/ez:BusinessTransaction#

response from service to client: 200
#?xml version="1.0" encoding="UTF-8"?#
#ez:BusinessTransaction xmlns:ez="http://www.example.net"#
#ez:BusinessTransactionHeader transactionCompletedDate="2009-11-02T00:05:56" transactionReceivedDate="2009-11-02T00:05:51" transactionId="123456" serviceIssuedDate="2009-11-02T00:05:56" operatorTransactionId="1234" operatorId="Media-1" /#
#ez:BusinessTransactionBody#
#ez:operationCheck type="max" serviceNumber="123456789" /#
#ez:operationCheck#
#ez:Product#
#ez:type speedMax="1111" type="Option 27" /#
#/ez:Product#
#ez:DeliveryDetails serviceNumber="123456789" typeCode="6ugev4" /#
#ez:Exchange code="r000000000022" name="box2" /#
#/ez:operationCheckResult#
#ez:Messages#
#ez:Message code="123-12345"#service activated#/ez:Message#
#/ez:Messages#
#/ez:BusinessTransactionBody#
#/ez:BusinessTransaction#
--
Ok.. one more simple question... ultimately it seems like you'd just like each of these files indexed.. preferrably with a proper sourcetype.

Would you like the file as one event... or the responses split up in to single events? and which field contains is the valid timestamp you'd like to have/index on...

Answer that.. and i should be able to give you a props.conf that makes it all happy.
wow ur more helpful than splunk! you dont work for splunk do you?

Hmm good question. it might be useful in the long run to have each xml message indexed, for reporting etc but for now it would be great to just index the whole file, and maybe later look into breaking it up if we need to. Timestamp indexing in the initial XML mesage is fine

this is not going into production yet, were indexing on a test box to get an idea of speed, disk space etc.

Im quite keen to learn splunk as i can see its powerful. I usually do analisys with perl, but to have instant graphing capability will be a huge improvement.

thanks for your help
More helpful than Splunk?.. well. that is why i started this community, because i think it could be far better than the Splunk forums (which are buried), and possibly better than the best practices in the docs.... but yes.. I do work for Splunk.. First sales engineer hired in 2006. Love the stuff and happy to help.

Damn! One more question for ya.. (Because i'm really exact on this stuff--and so is Splunk, as it is truely a time-series search engine, and its important we get the timestamp perfect, as Splunk will index on that dimension). In the XML message.. which exact timestamp do you want? as i see multiples..

transactionCompletedDate="2009-11-02T00:05:56"
transactionReceivedDate="2009-11-02T00:05:51"
serviceIssuedDate="2009-11-02T00:05:56"
operatorIssuedDate="2009-11-02T00:26:03"
good question. the first message is not timestamped on recipt so if there is a delay in transmission from the other side that first xml message may be inaccurate.

i think the timestamp in the second message where we are forwarding it is the best, thats where we handle the transaction so thats what we need reporting on. hopefully i can work out how to configure it further with some studying. im ok with regex if thats how its configured and have my trusty oreilly regex manual to hand!

yea i realise this is very bad way of logging transactional messages, (the company i probably cant mention) were doing this long before i got here, i have plans to improve the system though.

as previously mentioned, thanks for the help
The thing to remember about Splunk is... "it doesn't really matter what format you put the data in" -- as long as we can either use Splunk's own learning intelligence, or we give Splunk a hand with our brains, and a config file.

You should be cool doing this:

1. Manually Sourcetype your input. I called mine "myxml". (this can be done at the GUI when you monitor the directory, or in the $SPLUNK_HOME/etc/apps/search/local/inputs.conf file. Mine looks like this:



2. Configure some rules to determine how your new sourcetype should behave.
this can be done by editing "props.conf" which may may or may not exist (can be created) in $SPLUNK_HOME/etc/apps/search/local/props.conf. I set mint mine up with this config:


The result of this configuration shows three events, (my TIME_PREFIX setting looks for the first occurrence of Date=" and takes the first timestamp after that--FYI.


Next up.. pipe your search to the command " | transaction fields=operatorTransactionId " and you'll link those events up that have the same field value, in this case "operatorTransactionId". But why is this even cooler, because the "transaction search command" calculates duration between first and last event... and then you start getting real wacky by doing this " | where duration>240 ". Pretty powerful, I think.

-michael
This works fantastic, thanks very much.

on another note, It seems that when i create a new index and then go to data inputs even though i can select the index i created in the drop down i cannot save, i get an error at the top saying index not recognised?

i was able to resolve this by restarting splunk from the CLI. is this a known bug?

Thanks
When you create an index, you may not have noticed but splunk tells you to restart at the top of the screen so you did exactly what you needed to.

RSS

© 2012   Created by Michael Wilde.

Badges  |  Report an Issue  |  Terms of Service