ProtocolDiscussion

Edit | Print this page

SIQ Protocol Internet Draft submitted to IETF:

http://www.ietf.org/internet-drafts/draft-irtf-asrg-iar-howe-siq-02.txt http://www.ietf.org/internet-drafts/draft-irtf-asrg-iar-howe-siq-01.txt http://www.ietf.org/internet-drafts/draft-irtf-asrg-iar-howe-siq-00.txt

http://www.networksorcery.com/enp/Protocol.htm#S
Network Sorcery RFC Sourcebook lists SIQ Server Index Query


~ Feb 18th 2004 we decided to split the protocol used for SIQueries? into two protocols, to avoid the conflicting goals presented by forcing it all into one protocol: high speed, anti-spoofing, server being able to trust clients to send accurate data.

SNMPForDummies SNMP traps is proposed as the protocol for trusted-client data sent from client to server

http://sourceforge.net/projects/snmpy/ A python implementation of SNMP

http://www.wtcs.org/snmp4tpc/snmp_rfc.htm SNMP resouces



Support for multiple scores for a domain IP pair --Robert Barclay, Fri, 22 Oct 2004 12:42:58 -0400 reply
In reviewing the above draft one thought that I think would significantly aid adoption and make the uses a little more flexible was the addition of an optional value to the query format specifying what type of score should be returned. If a reputation provider has multiple types of scores they can return it would be useful for a querier to specify which score they want back. This could be done by providing all of the values as name=value pairs in the response, but there may be situations where this does not work well. Other mechanisms to achieve the same result would be always returning the dat from which the score was derived in the response, or running multiple servers on different ports for each possible score, but neither of these seem ideal and neither seems to be a significant improvement over existing DNSBL response mechanisms. If a querier could specify at a more granular level than domain/IP what they want to know you could add flexibility to the system without adding substantial overhead or flexibility. Obviously the possible types of scores a provider may return, and their inended interpretation will be dependent on thew provider.

Queries that specify criteria to use or type of score to return --AprilDL, Sun, 24 Oct 2004 08:45:09 -0400 reply
Hi Robert - let me see if I am understanding you correctly. If not, can you please give examples of queries and responses and their purpose.

I was strongly in favor of including a mechanism for the query client to control the processing that occurs in the query server. In fact the SIQ protocol does have this although it wasn't named or described that way. Request from you and others, as well as specific examples of the need and result could have the effect of getting us to explain this usage in the protocol specification and/or some other adaptation.

Please refer to section 3.1 UDP Query Format, under the description of "RD": "or other characters may be sent here to specify other than default processing" - The amount of space reserved for this results in a huge number of different choices for the type of processing or type of score you want return.

I believe Anthony spoke against this because his, and Derek's view as I understand them - is that a large number of independent SIQ servers exist in the world - and the query client might choose to query many different ones, and they might not all offer the same or know the same RD codes for "how to process." Perhaps in a random, anonymous sort of query / response system more like blacklists of today.

My view, in contrast, is that the query client and query server KNOW EACH OTHER and have clear policies and agreements about what a request where RD="x" or y or z means in terms of this particular query client. In my view, the query client selects a particular SIQ server network which will give him the exact type of processing of queries which he wants.

Please give examples of the number of different kinds of scores you would want returned, or different "things you want to know", also if my answer has demonstrated that what you want to do is already supported by the SIQ protocol.

Thank you,

  • April Lorenzen

SIQ client independence -- Sun, 24 Oct 2004 09:35:29 -0400 reply
April's response expresses things very well.

Essentially the SIQ protocol is design to be ignorant of the SIQ server implementation. You don't want to put server specific implementation details into a protocol, unless you can clearly make a case that it is universally required. This allows for any SIQ client to query any SIQ server in a universal and consistant way. Think how DNS, SMTP, and POP are so universal in nature, yet have the possibilities for extension beyond the basic minimum.

The UDP protocol does provide for 3 standard type of scores. Originally I opted for one single score, but April convinced me of the utility of two additional types of scores (see the draft). A UDP packet has to be well defined and allocating extra fields for server-specific details is not a good idea for max. portability, unless it can be shown that all SIQ servers would implement the functionality demanded by new fields. The UDP format does allow for the RD section to be used for "other" data, though maybe this should be stated more clearly in the Rationale which discusses it.

An HTTP query and its response are more flexible for custom server-specific details. This is because extra HTTP headers can be easily added in both the query and response without impacting the standard elements returned. A basic SIQ server would ignore unknown / unsupported headers in a query from a specialised SIQ client and return a basic response, while an enhanced SIQ-server could just add supplemental X-SIQ-* headers to the response without impacting a basic SIQ client.

The nice things about the current draft protocol is that different SIQ client implementations can work with any SIQ server implementation. I can install milter-siq and use Outbound-Index one day then switch to another SIQ server if I'm not satisified, or even use many SIQ servers at once and compare the results without having to change my installed software.

Anthony Howe

Re: SIQ client independence --Robert Barclay, Tue, 26 Oct 2004 17:46:25 -0400 reply
Thanks for the responses. They clarify several things for me. I will (in a separate post for readability) provide some examples of what I was thinking. In relation to te dependencies between SIQ clients and servers I have a few observations.

First, the document says in several places that the method of deriving the scores, and their meanings are outside the scope of the protocol. This seems on its face to contradict your desire to be able to interchangeably query SIQ servers and compare the results. Unless the scores all mean the same thing there is no way to gaurantee that the comparison is meaningful unless you already know something about the provider of the score and thier methods. It sounds from the above post that the goal is for a single entity to be able to publish a single score rating how likely they think it is tat a receiver should accept email from a source. I think this is an unecessary limitation and that this protocol could be used to provide much richer reputation data.

The protocols you gave I think provide some interesting examples of dealing with exactly the problem of multiple types of data being exchanged over a single protocol. Both of these protocols have mechanisms for a client to specify exactly what type of data they are sending to, or expect to receive from the server. Both protocols also have extension mechanisms built into them for clients and servers to exchange data outside of the published spec (through ESMTP commands, or through querying unassigned type codes in DNS).

In short I think the ability to use the protocol to more flexibly exchange elements of server reputation is one of the biggest potential benefits of SIQ over further expansion of existing DNSBL formats and is by itself valuable enough to have a specific extension mechanism within the UDP query. This is especially true if this can be done in a way that does not break the base functionality.

Flexibility --AprilDL, Wed, 27 Oct 2004 08:56:43 -0400 reply
I hope that the protocol is flexible enough to accomodate a wide range of ideas of how to do reputation, identity, or other third party data about at least a domain and IP, extensibly some other data elements. I think there could certainly be networks of SIQ servers which conformed to the same extended functionality - thus the ability to try more than one server or compile results from more than one server (which I personally would have no use for - but others do), or have one server saying ask this other server.

Looking forward to the examples. My greatest hope is that although there may be many creative and different ways of using it, the protocol will eventually if not now be both flexible and boundaried enough to serve them all well.

Examples of client data impacting server response --Robert Barclay, Wed, 27 Oct 2004 13:23:35 -0400 reply
Here are a few examples that I hope will clarify what I was envisioning. Based on the comments above I now think it is probably possible to achieve these results within te existing protocol, but I am not sure that they fit in the most efficient way, or that they don't break the interoperability goal listed above. I should note that these examples are not really theoretical, they are based with discussions I have had with both large receivers, and large senders about how they would envision a reputation service working, and how they would like to use the data.

Example #1-

A large receiver already has a fairly substantial set of data on which to base a reputation, but it is all local. Their primary use for a reputation system is for additional data elements to add into their system, if the service has some unique or interesting new data. They would like to be able to choose specific data elements from the ones a reputation service offers and add those into their system. They would within their query to the system specify the specific elements they are interested in. They could either query a individual data element and receive the score for that element along with some other data in the response, or they could query a set of elements and receive a score aggregated across those specific elements along with the individual scores in the text part of the response.

Example #2- A data aggregator allows email receivers to create their own custom score, deciding which elements to weight and how heavily. The score calculation is done by the aggregator and then queried by the receiver in their mail stream. The server needs a mechanism to know which custom score it should return, or if it should return some default calculated score. The default score in this case could be identical to existing SIQ implementations. In the case of a custom score the query format would vary somewhat but the respnse format would look identical (just with a different score).

Do these two help? If not I can probably come up with some others (thought they may not be based on specific discussions I have had with potential consumers of reputation data).

Re: SIQ client independence --Anthon Howe, Mon, 01 Nov 2004 04:21:58 -0500 reply
§ Concerning the derivation of scores: the very fact that the SIQ protocol does NOT specify how scores should be computed allows for independence from any one SIQ implementation. All the SIQ client cares to see is a value between -1 and 100 for any of the 4 scores categories. Different SIQ servers will have more or less data and different formulae from which to derive these scores and their successfulness and accuracy will depend how good their data is and how well they combine it. Dedicated SIQ client/server combinations that are finely tuned to each other can always be created and those willing to use that sort of service are welcome to it. I personally prefer transparency and independence and minimal configuration.

By NOT specifying how the scores are generated here then several options are open: a separate specification can be put forward by IAR to cover scoring (either loosely or strictly); SIQ server implementers have the option to patent and/or charge for their service, so market forces should govern who has the better method. I think a specification on scoring would NOT be a good idea; it might limit a SIQ server implementer's options as to what they can do and how they might innovate, and more likely that we could never generate consensus on how scoring should be done - if scoring were like a sports match, then it would be easy enough to create rules concerning how to score, but I think there are so many possible variables to consider in a reputation system that its best one can do is specify how the I/O should be presented (the SIQ protocol) and allow for extensions to fine tune the process for sites.

§ Concerning protocol tailored request/responses: Your two examples given should be possible within the framework the current protocol. I personally I'm not sure I like the RCPT portion of the query to be multi purpose, preferring to define that field, be it empty, and say what follows in the remainder of the packet is server specific query data. The HTTP version is certainly flexible enough to accommodate your desired request format and would be the recommended method. The Outbound Index tends to fall in the second example. I personally don't want to fuss with tuning a server's scoring system, having enough other things to tweak, so I'm not attracted to such features - IMHO the more accurate yet automated the process the more comfortable I am - if I don't like the results, I can change servers. This has been argued over many time by April and I.

Now something else that is being overlook here, in particular with the UDP format is the VERSION field. For maximum customization, we could modify the protocol to state that if bit 0 from the start of the packet (the high-order bit of the VERSION) is set, then packet bits 1-7 (or 1-32) are a SIQ server specific packet version code and the rest of the UDP packet is SIQ server specific. We could call on IANA to record/register these variants, if we used 1-32. If we used the shorter version code packet bits 1-7, giving VERSION code range 128-255, then we simply state that a SIQ client/server have an intimate relationship in which they agree; a basic SIQ server receiving a custom request could simply reply with VERSION=1 SCORE=UNKNOWN.

Further on the above examples --Robert Barclay, Tue, 02 Nov 2004 12:32:47 -0500 reply
Here is a little more detail about example 1 above which hopefully will make it clearer. As I said I now think that each of my examples is expressible within the current protocol. The issue is more about whether they are expressible in a way that does not break existing implementations or significantly interfere with interoperability. The reputation system has a set of data elements on which it is basing a score which include spam trap hits, end user complaint data from a wide variety of sources, unknown user rate data from a wide variety of sources, ad some basic traffic shaping type metrics (average volume form domain/IP pair, standard deviation, a list of IPs? currently seen sending at far above their expected volume). The receiver who already has a large volume of internally collected data, and a system to process that data near real time . They are not interested in the composite score, but would like to know the complaint rate and the deviation from average send volume for the domain/IP pair. Each of these elements will be added to the data they have available internally and used to enhance their internal spam filtering. I would consider SIQ an ideal way to represent these individual data points (either as a value somehow compacted into the score, a rate, or more likely a percentile ranking). To achieve this though the receiver needs a way to embed into the query the list of data elements they are inerested in (or optionally do multiple queries for each of the individual data elements).

Re: SIQ client independence --Robert Barclay, Tue, 02 Nov 2004 12:58:40 -0500 reply
Concerning the derivation of scores: I agree that a specification on how scores are derived would not be beneficial. My concern about scores and transparency was not intended t mean that there should be a specification about how scores are derived. My concern was more that it was not just the formulae used to calculate the scores that would differ, but that the actual underlying meaning would differ (and I believe should be allowed to). In my example above I talked about using SIQ to exchange individual data elements that make up a sender's reputation rather than just a composite judgement howeverthat might be derived. I would clearly not make sense to directly compare one of these scores which is only intended to be informative against one that implies a judgement or an overall rating of some kind.

Concerning protocol tailored request/responses: I concur that both of these examples can be done within the existing framework. I just am not sure that it can be done in a completely sound way. I think it would make more sense (as I think you suggested) to have a seperate ortion of the query packet reserved for implementation specific data. Using the RCPT data portion may inadvertently break implementations that put the actual RCPT domain in this area, and requiring all of the requests to be done over http is likely to be a pretty significant hurdle to convincing people to query your server. (On the second point I will admit that I am just guessing and would need to defer to people with expertise in getting SIQ querying implemented by various systems).

RCPT domain in query --AprilDL, Thu, 04 Nov 2004 11:52:27 -0500 reply
I was really against this, esp against labelling it as the RCPT domain out of concern that people would stick to that being the only piece of data to put there. We did include language to allow anything there. My intention was that if we were hosting multiple domains on one inbound server, those domains would be able to select their own processing / scoring preferences - thus the need to tell the SIQ server the RCPT domain so that the SIQ server can process it according to that domain's wishes.

Sendmail libmilter presently isn't capable of handling a multi-recipient message as if there were a copy for each domain that could be handled different - different header writing / accept / reject / etc. To do so in MS Exchange plug in is also too time consuming for our first pass at developing one also. So at present we accept that if you are sharing an MX, you are sharing the same SIQ processing - due to multi-rcpt messages.

Re: RCPT domain in query --Anthony Howe, Sat, 20 Nov 2004 03:51:42 -0500 reply
§ Concerning the RCPT field: I'm thinking of conceding the point about the RCPT field naming. Given paranoid privacy concerns in Europe, it might be wiser to steer clear of such associations by calling it something completely different like EXTRA and documenting it further as to how it might be used for custom tailored requests between intimate SIQ client/server combos.

§ Concerning multiple RCPTS for a message. Its an awful problem to solve. If you make multiple queries for a single message, one per RCPT domain, then how do you deal with cases were some RCPT domains have tuned their scoring to say reject, others discard, still others say tag, and finally some say accept. In the case of reject/discard, you can simply drop a RCPT from the RCPT list, but in the case of tag and accept conflicts, there is no way to generate new messages per RCPT tailored to reflect their desires, especially in a pre-DATA milter. So you could implement some sort of majority rule, with ties being resolved by a sys.admin. configured choice.

§ Concerning the meaning and interpretation of scores: I think there should be a universal strict definition for the SCORE, an overall composite score that makes a judgement (how is not specified). The other supporting scores (IP-SCORE, DOMAIN-SCORE, REL-SCORE) used in the generation of the SCORE can and should have weaker definitions, since they will pertain and reflect specific data collected by a SIQ server and possibly adjusted by user preferences. I think its necessary to provide one SCORE field that goes out on a limb and makes a judgement, as this simplifies client side implementation for basic service. Refined client implementations could take the other scores and generate their own composite judgement and ignore the SCORE or make a comparison. Having one well defined SCORE that gives a judgement (-1..100) is already far better than a binary answer provided by blacklists - it allows for shades of grey that the client MX can work with and tune.

SIQ Response Scores --Anthony Howe, Sat, 20 Nov 2004 04:41:58 -0500 reply
If additional scores, other than the SCORE and IP-SCORE, DOMAIN-SCORE, REL-SCORE, are required in the response, then please suggest what might be returned. My concern is that any protocol has to have standard elements that everyone can work with and that the UDP packet is limited in size. Also as you add more and more information to the response a) you need to justify it as being globally interesting, b) you start pushing the score computation work from server to client, c) paid-for services could end up inadvertently giving too much away to evil data-mining clients, and d) the questions of data privacy (how much the server knows, how much the client can find out).

Domain / IP: public or private? --Anthony Howe, Sat, 20 Nov 2004 05:06:53 -0500 reply
Already some people think a domain/IP is too revealing (which begs the question of why they're on the Internet at all), but if you start giving too detailed a SIQ response, you may fall afoul of the EU Privacy Directive, which I've been trying to find more detailed information about, but so far this has been the best summary:

http://www.dss.state.ct.us/digital/eupriv.html

Apparently the original text is full of exceptions and may have had addendums since, but I haven't found any clear text as to how it impacts anti-spam filters. Also this document

http://www.loeb.com/CM/Articles/articles19.asp

is interesting too as it provides a summary of the Safe Harbor mechanism concerning US/EU relations.

I'm going to put forward this argument concerning a domain and IP:

a. An IP address is assigned temporarily (dynamic DNS) or long term (static), but it is something that has to be requested for and allocated. Long term IP allocations are in the public record (but the ipwhois information has also been recently limited by RFC 3912).

b. A data subject or business rents a domain name for the point of simplifying how they are found on the Internet, this rental information (whois) is in the public record (though some elements of this have recently changed with RFC 3912).

Now an IP address is required if you want to be on the Internet, otherwise its impossible to function, but in theory a domain name is optional, voluntary, and a clear indication of one's intent to be found for some purpose. Having an IP address is similar to having a street address; a domain name is similar to hanging a sign outside your shop with your business name on it or giving a name to your villa or farm. You need the IP, but you can live without a domain name (a return to 1970's Internet).

I would argue that an IP address is not as revealing about a data subject, especially now that RFC 3912 allows for ipwhois servers to limit what they reveal to the public with respect to their national privacy laws, such as real world address and phone numbers. So querying and IP based blacklist reveals little or nothing about a data subject other than maybe their name.

If an IP can be consider public information and having a domain is considered to indicate some willingness to be more easily found on the Interent and therefore public information, then I see using a domain name and IP address in a query as not being able to communicate anything of value concerning a data subject that hasn't already been revealed by the data subject themselves.

In the case of email, as mentioned in the Security Consideration of the SIQ protocol draft:

Similar information already appears in message trace headers and those headers may have already been viewed and logged by intermediate MX servers during transit. Taking this perspective, the queries make use of information that may have already been revealed else where.

Domain spoofing and privacy --AprilDL, Sat, 20 Nov 2004 18:44:43 -0500 reply
Let's say I get April-Lorenzen.com. Now Joe in France on a dialup sends an email claiming envelope-from something@April-Lorenzen.com. Somebody does queries about this. Nothing about his spoof or the queries about his spoof prove ANYTHING about whether or not I sent an email anywhere. So the fact that somewhere in the world, my domain - even if personal - was used in an email - doesn't personally identify ANYTHING about me.

Re: Domain spoofing and privacy --Anthony Howe, Sun, 21 Nov 2004 10:47:59 -0500 reply
I think that might almost make some sort sense in a Monty Python sort of way.

SIQ response scores --Robert Barclay, Thu, 02 Dec 2004 11:52:12 -0500 reply
Not sure if you saw the goals I posted to the ASRG IAR list, but my intent there, as it has been here is to try to express the variety of types of data I can see reputation systems generally trying to express. While I think most systems will produce a score of some kind I know both from my own experience in developing a system and in talking to a pretty wide variety of people building other systems that that is far from the only thing people want to express, and sometimes the judgemental score is not even desired by the recipients of your data. Here is the goal I posted to the list made up of data elements I know people already are/want to publish 6) Responses must be able to accomodate the following data types a. Judgemental score (this sender is in the 99th percentle of good guys by my criteria) b. Suggested action (block this sender, accept this sender) c. Specific measured data (this sender sent 500K messages today) d. A confidence rating for a score (I think this guy is bad but I have only seen enough mail to be 50% sure of it) e. Multiple scores in a singe response (here is the overall score for the domain you requested, and here is the score for that specific IP address when sending for that domain)

I also suggested that there should be a defined set of known data elements that can be published and a mechanism to extend that set (ideally both a private mechanism for interaction between sstems that know each other and a public mechanism to extend the known set). None of this is mean to say that any specfic reputation system will use more than a small set of the known data elements of course.

Domain spoofing and Privacy --Robert Barclay, Thu, 02 Dec 2004 12:00:58 -0500 reply
I was actually talking to Dave Anderson from Sendmail the other day about regional differences and their impact on what you can publish. Besides the legal issues, of which the EU directives are only one example, there are a number of regional differences that will affect systems that somehow attempt to tie accountability to a domain or domain IP pair. The conclusion I have come to is that reputation systems are going to end up being regional. If I limit the data I publish to things observed by receipients in the US I limit (but may or may not remove) the possibility of being affected by international laws. But I also limit the scope of the mail I see to mail seen by recipients in the US which means that I will end up not having terribly useful data about spammers who send in Korean to recipients in Korea for example. On the specific issue you mentioned I am sure it will come up in a court somewhere whether domain/IP pair is personally identifiable enough to violate any data protaction or privacy laws, but my gut feeling is similar to April's that a domain and a person are different enough identities to avoid these problems.

Re: SIQ response scores --Anthony Howe, Thu, 09 Dec 2004 06:52:30 -0500 reply
I've been following your ASRG IAR discussion concerning your suggested goals. I sort of like the idea of a "field registry" or field-discovery as a means of finding out what standard fields are available. Certainly the SIQ protocol has still plenty of room in the UDP response packet to add more fields. My biggest concern though is that it will be very hard to get concensus on what fields should be become standard fields and how they will be presented in the packet (though layout issues should be simple enough to get around). Also a field registry/discovery step would add extra overhead if the results are not cached for a period of time, which mean a TTL value will be necessary. If TCP were the only choice, life would be far easier since an HTTP like query/response is so flexible. I think the existing services that are participating in IAR have to say more about what information they would like to see in responses. So far everyone is playing their cards close to their chest, which I find really sad. For a research group I would have expect more input from people. I haven't comment on anything myself, partly because no one has asked direct questions concerning SIQ nor has there been sufficient input yet for me to make alterations to SIQ to support these ideas.

This page was last edited 2 years ago by AprilDL. View page history | Edit this page
Subject:


Comment:


    with signature