Ruby input matching, validation and normalization pitfalls

Regular expressions in Ruby are multiline by default

A common pitfall in Ruby’s regular expressions is to match the string’s beginning and end by ^ and $, instead of \A and \z.

https://guides.rubyonrails.org/security.html#regular-expressions

def valid_url?(url)
return true if /^https?:\/\/[^\n]+$/i =~ url
return false
end
valid_url? “https://example.com"
# => true
valid_url? “javascript:alert(‘xss’)”
# => false
valid_url? “javascript:alert(‘xss’);/*\nhttps://example.com\n*/"
# => true
def valid_url?(url)
return true if /\Ahttps?:\/\/[^\n]+\z/i =~ url
return false
end
valid_url? “https://example.com"
# => true
valid_url? “javascript:alert(‘xss’)”
# => false
valid_url? “javascript:alert(‘xss’);/*\nhttps://example.com\n*/"
# => false

Unicode Case Mapping Collisions

‘admin@my_domaın.com’.upcase == ‘admin@my_domain.com’.upcase
# => true

Simple things can be trickier than you think

def is_same_site(url)
return true if url.include?(‘http://example.com')
return false
end
URI.valid_url?(‘http://example.com/')
# => true
is_same_site(‘http://example.com/')
# => true
URI.parse(‘http://example.com/').hostname
# => “example.com”
URI.valid_url?(http://example.com@evil.com/)
# => true
is_same_site(http://example.com@evil.com/)
# => true
URI.parse(http://example.com@evil.com/).hostname
# => “evil.com”
def is_same_site(url)
return true if url.include?(‘http://example.com/')
return false
end
is_same_site(http://example.com@evil.com/)
# => false

Recursively delete substrings when sanitizing input

‘../../file.rb’.gsub(‘../’,’’)
‘….//….//file.rb’.gsub(‘../’,’’)
def recursive_gsub(string)
str = string.dup
loop { break unless str.gsub!(‘../’, ‘’) }
str
end

ReDOS

The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time.”The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time.

https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

/^([a-z]+)*$/ =~ “a!”
# completes instantly
/^([a-z]+)*$/ =~ “aaaaaaaaaaaaaaaaaaaaaaaaaaaa!”
# takes around 7 seconds to complete
/^([a-z]+)*$/ =~ “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!”
# I killed process after waiting around 1 minute

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store